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Letter to Student 
To the Student: 


This course and this Student Manual reflect a collective effort by your 
instructor, the Vietnam Education Foundation, the Vietnam Open 
Courseware (VOCW) Project and faculty colleagues within Vietnam and 
the United States who served as reviewers of drafts of this Student Manual. 
This course is an important component of our academic program. Although 
it has been offered for many years, this latest version represents an attempt 
to expand the range of sources of information and instruction so that the 
course continues to be up-to-date and the methods well suited to what is to 
be learned. 


This Student Manual is designed to assist you through the course by 
providing specific information about student responsibilities including 
requirements, timelines and evaluations. 


You will be asked from time-to-time to offer feedback on how the Student 
Manual is working and how the course is progressing. Your comments will 
inform the development team about what is working and what requires 
attention. Our goal is to help you learn what is important about this 
particular field and to eventually succeed as a professional applying what 
you learn in this course. 


Thank you for your cooperation. 


Tuan Do-Hong. 


Contact Information 


Faculty Information: Department of Telecommunications Engineering, 
Faculty of Electrical and Electronics Engineering, Ho Chi Minh City 
University of Technology 


Instructor: Dr.-Ing. Tuan Do-Hong 

Office Location: Ground floor, B3 Building 
Phone: +84 (0) 8 8654184 

Email: do-hong@hcmut.edu.vn 

Office Hours: 9:00 am — 5:00 pm 
Assistants: 

Office Location: Ground floor, B3 Building 
Phone: +84 (0) 8 8654184 

Email: 

Office Hours: 9:00 am — 5:00 pm 


Lab sections/support: 


Resources 

Connexions: http://cnx.org/ 

MIT’s OpenCourseWare: http://ocw.mit.edu/index.html 
Computer resource: Matlab and Simulink 

Textbook(s): 

Required: 


[1] Bernard Sklar, Digital Communications: Fundamentals and 
Applications, 2nd edition, 2001, Prentice Hall. 


Recommended: 


[2] John Proakis, Digital Communications, 4th edition, 2001, McGraw- 
Hill. 


[3] Bruce Carlson et al.. Communication Systems: An Introduction to 
Signals and Noise in Electrical Communication, 4th edition, 2001, 
McGraw-Hill. 


[4] Rogger E. Ziemer, Roger W. Peterson, Introduction to Digital 
Communication, 2nd edition, 2000, Prenctice Hall. 


Purpose of the Course 

Title: Principles of Digital Communications 
Credits: 3 (4 hours/week, 15 weeks/semester) 
Course Rationale: 


Wireless communication is fundamentally the art of communicating 
information without wires. In principle, wireless communication 
encompasses any number of techniques including underwater acoustic 
communication, radio communication, and satellite communication, among 
others. The term was coined in the early days of radio, fell out of fashion 
for about fifty years, and was rediscovered during the cellular telephony 
revolution. Wireless now implies communication using electromagnetic 
waves - placing it within the domain of electrical engineering. Wireless 
communication techniques can be classified as either analog or digital. The 
first commercial systems were analog including AM radio, FM radio, 
television, and first generation cellular systems. Analog communication is 
gradually being replaced with digital communication. The fundamental 
difference between the two is that in digital communication, the source is 
assumed to be digital. Every major wireless system being developed and 
deployed is built around digital communication including cellular 
communication, wireless local area networking, personal area networking, 
and high-definition television. Thus this course will focus on digital 
wireless communication. 


This course is a required core course in communications engineering which 
introduces principles of digital communications while reinforcing concepts 
learned in analog communications systems. It is intended to provide a 
comprehensive coverage of digital communication systems for last year 
undergraduate students, first year graduate students and practicing 
engineers. 


Pre-requisites: Communication Systems. Thorough knowledge of Signals 
and Systems, Linear Algebra, Digital Signal Processing, and Probability 
Theory and Stochastic Processes is essential. 


Course Description 


This course explores elements of the theory and practice of digital 
communications. The course will 1) model and study the effects of channel 
impairments such as distortion, noise, interference, and fading, on the 
performance of communication systems; 2) introduce signal processing, 
modulation, and coding techniques that are used in digital communication 
systems. The concepts/ tools are acquired in this course: 

Signals and Systems 

Classification of signals and systems 

Orthogonal functions, Fourier series, Fourier transform 

Spectra and filtering 

Sampling theory, Nyquist theorem 

Random processes, autocorrelation, power spectrum 

Systems with random input/output 

Source Coding 

Elements of compression, Huffman coding 

Elements of quantization theory 

Pulse code Modulation (PCM) and variations 

Rate/bandwidth calculations in communication systems 

Communication over AWGN Channels 


Signals and noise, Eb/NO 


Receiver structure, demodulation and detection 


Correlation receiver and matched filter 
Detection of binary signals in AWGN 
Optimal detection for general modulation 
Coherent and non-coherent detection 
Communication over Band-limited AWGN Channel 
ISI in band-limited channels 

Zero-ISI condition: the Nyquist criterion 
Raised cosine filters 

Partial response signals 

Equalization using zero-forcing criterion 
Channel Coding 

Types of error control 

Block codes 

Error detection and correction 

Convolutional codes and the Viterbi algorithm 
Communication over Fading Channel 
Fading channels 

Characterizing mobile-radio propagation 
Signal Time-Spreading 


Mitigating the effects of fading 


Application of Viterbi equalizer in GSM system 


Application of Rake receiver in CDMA system 


Calendar 

Week 1: Overview of signals and spectra 

Week 2: Source coding 

Week 3: Receiver structure, demodulation and detection 


Week 4: Correlation receiver and matched filter. Detection of binary signals 
in AWGN 


Week 5: Optimal detection for general modulation. Coherent and non- 
coherent detection (1) 


Week 6: Coherent and non-coherent detection (II) 


Week 7: ISI in band-limited channels. Zero-ISI condition: the Nyquist 
criterion 


Week 8: Mid-term exam 

Week 9: Raised cosine filters. Partial response signals 

Week 10: Channel equalization 

Week 11: Channel coding. Block codes 

Week 12: Convolutional codes 

Week 13: Viterbi algorithm 

Week 14: Fading channel. Characterizing mobile-radio propagation 
Week 15: Mitigating the effects of fading 


Week 16: Applications of Viterbi equalizer and Rake receiver in GSM and 
CDMA systems 


Week 17: Final exam 


Grading Procedures 
Homework/Participation/Exams: 


e Homework and Programming Assignments 
e Midterm Exam 
e Final Exam 


Homework and programming assignments will be given to test student's 
knowledge and understanding of the covered topics. Homework and 
programming assignments will be assigned frequently throughout the 
course and will be due in the time and place indicated on the assignment. 
Homework and programming assignments must be individually done by 
each student without collaboration with others. No late homework will be 
allowed. 


There will be in-class mid-term and final exams. The mid-term exam and 
the final exam will be time-limited to 60 minutes and 120 minutes, 
respectively. They will be closed book and closed notes. It is recommend 
that the students practice working problems from the book, example 
problems, and homework problems. 


Participation: Question and discussion in class are encouraged. Participation 
will be noted. 


Grades for this course will be based on the following weighting: 


¢ Homework and In-class Participation: 20% 
e Programming Assignments: 20% 

e Mid-term Exam: 20% 

e Final Exam: 40% 


Signal Classifications and Properties 
Describes various classifications of signals. 


Introduction 


This module will begin our study of signals and systems by laying out some 
of the fundamentals of signal classification. It is essentially an introduction 
to the important definitions and properties that are fundamental to the 
discussion of signals and systems, with a brief discussion of each. 


Classifications of Signals 


Continuous-Time vs. Discrete-Time 


As the names suggest, this classification is determined by whether or not 
the time axis is discrete (countable) or continuous ((link]). A continuous- 
time signal will contain a value for all real numbers along the time axis. In 
contrast to this, a discrete-time signal, often created by sampling a 
continuous signal, will only have values at equally spaced intervals along 
the time axis. 


this axis continuous 
or discrete 


Analog vs. Digital 


The difference between analog and digital is similar to the difference 
between continuous-time and discrete-time. However, in this case the 
difference involves the values of the function. Analog corresponds to a 


continuous set of possible function values, while digital corresponds to a 
discrete set of possible function values. An common example of a digital 
signal is a binary sequence, where the values of the function can only be 
one or zero. 


this axis continuous 
or discrete 


Periodic vs. Aperiodic 


Periodic signals repeat with some period 7’, while aperiodic, or 
nonperiodic, signals do not ([{link]). We can define a periodic function 
through the following mathematical expression, where ¢ can be any number 
and T is a positive constant: 

Equation: 


f(t) = f(t +T) 


fundamental period of our function, f(t), is the smallest value of T that 
the still allows [link] to be true. 


A periodic signal with period To 


An aperiodic signal 


Finite vs. Infinite Length 


Another way of classifying a signal is in terms of its length along its time 
axis. Is the signal defined for all possible values of time, or for only certain 
values of time? Mathematically speaking, f(t) is a finite-length signal if it 
is defined only over a finite interval 


= t <1 


where t; < tg. Similarly, an infinite-length signal, f(t), is defined for all 
values: 


—o <t< oo 


Causal vs. Anticausal vs. Noncausal 


Causal signals are signals that are zero for all negative time, while 
anticausal are signals that are zero for all positive time. Noncausal signals 
are signals that have nonzero values in both positive and negative time 


(Link). 


f(t) 


zero here 


A causal signal 


f(t) 


zero here 


An anticausal signal 


f(t) 


A noncausal signal 


Even vs. Odd 


An even signal is any signal f such that f(t) = f(—t). Even signals can be 
easily spotted as they are symmetric around the vertical axis. An odd 
signal, on the other hand, is a signal f such that f(t) = — f(—t) ([link)). 


fe(t) 
es. t 


An even signal 


fo(t) 


An odd signal 


Using the definitions of even and odd signals, we can show that any signal 
can be written as a combination of an even and odd signal. That is, every 
signal has an odd-even decomposition. To demonstrate this, we have to look 
no further than a single equation. 

Equation: 


fl) = 5 (F®) + FH) + 5 FO — FH) 


By multiplying and adding this expression out, it can be shown to be true. 
Also, it can be shown that f(t) + f(—t) fulfills the requirement of an even 
function, while f(t) — f(—t) fulfills the requirement of an odd function 
({link]). 


Example: 


The signal we will 
decompose using odd- 
even decomposition 


f(t) 


f(t) : 
“1 


Even part: e(t) = + (f(t) + f(—t)) 


oe 


2e(t) 
2 
t 
4 


it's even! 


- 


f(t) 
2o(t) 


oe 


it's odd! 


oe 


Odd part: o(t) = + (f(t) — f(t) 


Check: e(¢) + o(t) = f(t) 


Deterministic vs. Random 


A deterministic signal is a signal in which each value of the signal is fixed, 
being determined by a mathematical expression, rule, or table. On the other 
hand, the values of a random signal are not strictly defined, but are subject 
to some amount of variability. 


Deterministic Signal 


0 sf, ype yoo way bi TON 2 


Random Signal 


Example: 
Consider the signal defined for all real ¢ described by 
Equation: 


_ fsin (2rt)/t t>1 
ro={ 0 peel 


This signal is continuous time, analog, aperiodic, infinite length, causal, 
neither even nor odd, and, by definition, deterministic. 


Signal Classifications Summary 


This module describes just some of the many ways in which signals can be 
classified. They can be continuous time or discrete time, analog or digital, 
periodic or aperiodic, finite or infinite, and deterministic or random. We can 
also divide them based on their causality and symmetry properties. 


System Classifications and Properties 
Describes various classifications of systems. 


Introduction 


In this module some of the basic classifications of systems will be briefly 
introduced and the most important properties of these systems are 
explained. As can be seen, the properties of a system provide an easy way 
to distinguish one system from another. Understanding these basic 
differences between systems, and their properties, will be a fundamental 
concept used in all signal and system courses. Once a set of systems can be 
identified as sharing particular properties, one no longer has to reprove a 
certain characteristic of a system each time, but it can simply be known due 
to the the system classification. 


Classification of Systems 


Continuous vs. Discrete 


One of the most important distinctions to understand is the difference 
between discrete time and continuous time systems. A system in which the 
input signal and output signal both have continuous domains is said to be a 
continuous system. One in which the input signal and output signal both 
have discrete domains is said to be a discrete system. Of course, it is 
possible to conceive of signals that belong to neither category, such as 
systems in which sampling of a continuous time signal or reconstruction 
from a discrete time signal take place. 


Linear vs. Nonlinear 


A linear system is any system that obeys the properties of scaling (first 
order homogeneity) and superposition (additivity) further described below. 
A nonlinear system is any system that does not have at least one of these 
properties. 


To show that a system H obeys the scaling property is to show that 
Equation: 


A(kf(t)) = kH(f(t)) 


f(t) ~@— [HK] y) = tt) — [Hh] — ®t) 


K K 
A block diagram demonstrating the scaling property of 


linearity 


To demonstrate that a system HT obeys the superposition property of 
linearity is to show that 


Equation: 

A(filt) + folt)) = A(filt)) + A(f2(t)) 
(4 7 
4 [4] —J = ‘3 — y 
f. 4 i —»[4]—7 


A block diagram demonstrating the superposition 
property of linearity 


It is possible to check a system for linearity in a single (though larger) step. 
To do this, simply combine the first two steps to get 
Equation: 


A(kyfi(t) + kefo(t)) = kA (fi(t)) + ko (fo(t)) 


Time Invariant vs. Time Varying 


A system is said to be time invariant if it commutes with the parameter shift 
operator defined by Sr (f (t)) = f(t — T) for all T, which is to say 
Equation: 


HS; = StH 


for all real T’. Intuitively, that means that for any input function that 
produces some output function, any time shift of that input function will 
produce an output function identical in every way except that it is shifted by 
the same amount. Any system that does not have this property is said to be 
time varying. 


f (t) Dicer i —> y(t) = fi) — OT it y (t-T) 


f(t-T) y (t) 
This block diagram shows what the condition for time 


invariance. The output is the same whether the delay is 
put on the input or the output. 


Causal vs. Noncausal 


A causal system is one in which the output depends only on current or past 
inputs, but not future inputs. Similarly, an anticausal system is one in which 
the output depends only on current or future inputs, but not past inputs. 
Finally, a noncausal system is one in which the output depends on both past 
and future inputs. All "realtime" systems must be causal, since they can not 
have future inputs available to them. 


One may think the idea of future inputs does not seem to make much 
physical sense; however, we have only been dealing with time as our 
dependent variable so far, which is not always the case. Imagine rather that 
we wanted to do image processing. Then the dependent variable might 
represent pixel positions to the left and right (the "future") of the current 
position on the image, and we would not necessarily have a causal system. 


ft) —> —> yt 


For a typical 
system to be 
causal... 


f (t) 


y (t) 
y (tg) is a function of only 
these values 


to 


...the output at time to, y(to), can only depend on the 
portion of the input signal before fo. 


Stable vs. Unstable 


There are several definitions of stability, but the one that will be used most 
frequently in this course will be bounded input, bounded output (BIBO) 
stability. In this context, a stable system is one in which the output is 
bounded if the input is also bounded. Similarly, an unstable system is one in 
which at least one bounded input produces an unbounded output. 


Representing this mathematically, a stable system must have the following 
property, where x(t) is the input and y(t) is the output. The output must 
satisfy the condition 

Equation: 


ly(t)| < My < oo 


whenever we have an input to the system that satisfies 
Equation: 


|a(t)| < Mz < co 


M, and M, both represent a set of finite positive numbers and these 
relationships hold for all of ¢. Otherwise, the system is unstable. 


System Classifications Summary 


This module describes just some of the many ways in which systems can be 
classified. Systems can be continuous time, discrete time, or neither. They 
can be linear or nonlinear, time invariant or time varying, and stable or 
unstable. We can also divide them based on their causality properties. There 
are other ways to classify systems, such as use of memory, that are not 
discussed here but will be described in subsequent modules. 


m04 - Theorems on the Fourier Series 


Theorems on the Fourier Series 


Four of the most important theorems in the theory of Fourier analysis are 
the inversion theorem, the convolution theorem, the differentiation theorem, 
and Parseval's theorem [link]. All of these are based on the orthogonality of 
the basis function of the Fourier series and integral and all require 
knowledge of the convergence of the sums and integrals. The practical and 
theoretical use of Fourier analysis is greatly expanded if use is made of 
distributions or generalized functions [link][link]. Because energy is an 
important measure of a function in signal processing applications, the 
Hilbert space of L? functions is a proper setting for the basic theory and a 
geometric view can be especially useful [link][link]. 


The following theorems and results concern the existence and convergence 
of the Fourier series and the discrete-time Fourier transform [link]. Details, 
discussions and proofs can be found in the cited references. 


e If f(a) has bounded variation in the interval (—zr, zr), the Fourier 
series corresponding to f(a) converges to the value f(z) at any point 
within the interval, at which the function is continuous; it converges to 
the value >[f(z +0) + f(x — 0)] at any such point at which the 
function is discontinuous. At the points 7, —7 it converges to the value 
x (f(—m + 0) + f(a — 0)). [link] 

e If f(a) is of bounded variation in (—v7, 77), the Fourier series 
converges to f(x), uniformly in any interval (a, b) in which f(z) is 
continuous, the continuity at a and b being on both sides. [link] 

e If f(a) is of bounded variation in (—zr, 77), the Fourier series 
converges to +[f(x + 0) + f(x — 0)], bounded throughout the 
interval (—7, 77). [link] 

e If f(a) is bounded and if it is continuous in its domain at every point, 
with the exception of a finite number of points at which it may have 
ordinary discontinuities, and if the domain may be divided into a finite 
number of parts, such that in any one of them the function is 
monotone; or, in other words, the function has only a finite number of 


maxima and minima in its domain, the Fourier series of f(z) 
converges to f(x) at points of continuity and to 

+ [f(x + 0) + f(x — 0)] at points of discontinuity. [link][link] 

If f(x) is such that, when the arbitrarily small neighborhoods of a 
finite number of points in whose neighborhood | f(z)| has no upper 
bound have been excluded, f(a) becomes a function with bounded 
variation, then the Fourier series converges to the value 

+ (f(x + 0) + f(x — 0)], at every point in (—z, 77), except the points 
of infinite discontinuity of the function, provided the improper integral 
f a f(ax)Qx exist, and is absolutely convergent. [link] 

If f is of bounded variation, the Fourier series of f converges at every 
point « to the value [f(z + 0) + f(a — 0)|/2. If f is, in addition, 
continuous at every point of an interval J = (a, b), its Fourier series is 
uniformly convergent in Jf. [link] 

If a(k) and b(k) are absolutely summable, the Fourier series converges 
uniformly to f(x) which is continuous. [link] 

If a(k) and b(k) are square summable, the Fourier series converges to 
f(x) where it is continuous, but not necessarily uniformly. [link] 
Suppose that f(z) is periodic, of period X, is defined and bounded on 
[0, X] and that at least one of the following four conditions is satisfied: 
(i) f is piecewise monotonic on [0, X], (ii) f has a finite number of 
maxima and minima on |0, X] and a finite number of discontinuities 
on [0, X], (iii) f is of bounded variation on [0, X], (iv) f is piecewise 
smooth on |0, X]: then it will follow that the Fourier series coefficients 
may be defined through the defining integral, using proper Riemann 
integrals, and that the Fourier series converges to f(x) at a.a.x, to 
f(a) at each point of continuity of f, and to the value 

S(f(x-) + f(x*)] at all «. [link] 

For any 1 < p < coandany f € C?(S'), the partial sums 

Equation: 


converge to f, uniformly as n — oo; in fact, 
by a constant multiple of n~?*1/?, [link] 


Sn — f\|,. is bounded 


The Fourier series expansion results in transforming a periodic, continuous 
time function, Z(t), to two discrete indexed frequency functions, a(k) and 
b(k) that are not periodic. 


m05 - The Fourier Transform 


The Fourier Transform 


Many practical problems in signal analysis involve either infinitely long or 
very long signals where the Fourier series is not appropriate. For these 
cases, the Fourier transform (FT) and its inverse (IFT) have been 
developed. This transform has been used with great success in virtually all 
quantitative areas of science and technology where the concept of 
frequency is important. While the Fourier series was used before Fourier 
worked on it, the Fourier transform seems to be his original idea. It can be 
derived as an extension of the Fourier series by letting the length increase to 
infinity or the Fourier transform can be independently defined and then the 
Fourier series shown to be a special case of it. The latter approach is the 
more general of the two, but the former is more intuitive [link][link]. 


Definition of the Fourier Transform 


The Fourier transform (FT) of a real-valued (or complex) function of the 
real-variable ¢ is defined by 
Equation: 


X(w) = [ 7 a(t)e 1" alt 


Oo 


giving a complex valued function of the real variable w representing 
frequency. The inverse Fourier transform (IFT) is given by 
Equation: 


(4) = =| X(w)e™* dw. 


Because of the infinite limits on both integrals, the question of convergence 
is important. There are useful practical signals that do not have Fourier 


transforms if only classical functions are allowed because of problems with 
convergence. The use of delta functions (distributions) in both the time and 
frequency domains allows a much larger class of signals to be represented 
[link]. 


Examples of the Fourier Transform 


Deriving a few basic transforms and using the properties allows a large 
class of signals to be easily studied. Examples of modulation, sampling, and 
others will be given. 


e If x(t) = d(t) then X(w) = 1 

e If x(t) = 1 then X(w) = 27d(w) 

e If x(t) is an infinite sequence of delta functions spaced T apart, 
x(t) = S-~_. 6(t — nT), its transform is also an infinite sequence 
of delta functions of weight 27/T spaced 27/T apart, 
X(w) = 2a yo". O(w — Ark /T). 

e Other interesting and illustrative examples can be found in [link][link]. 


Note the Fourier transform takes a function of continuous time into a 
function of continuous frequency, neither function being periodic. If 
distribution" or delta functions" are allowed, the Fourier transform of a 
periodic function will be a infinitely long string of delta functions with 
weights that are the Fourier series coefficients. 


Review of Probability Theory 


The focus of this course is on digital communication, which involves transmission of 
information, in its most general sense, from source to destination using digital technology. 
Engineering such a system requires modeling both the information and the transmission 
media. Interestingly, modeling both digital or analog information and many physical 
media requires a probabilistic setting. In this chapter and in the next one we will review 
the theory of probability, model random signals, and characterize their behavior as they 
traverse through deterministic systems disturbed by noise and interference. In order to 
develop practical models for random phenomena we start with carrying out a random 
experiment. We then introduce definitions, rules, and axioms for modeling within the 
context of the experiment. The outcome of a random experiment is denoted by w. The 
sample space £2 is the set of all possible outcomes of a random experiment. Such 
outcomes could be an abstract description in words. A scientific experiment should indeed 
be repeatable where each outcome could naturally have an associated probability of 
occurrence. This is defined formally as the ratio of the number of times the outcome 
occurs to the total number of times the experiment is repeated. 


Random Variables 


A random variable is the assignment of a real number to each outcome of a random 
experiment. 


X(@) 


Example: 

Roll a dice. Outcomes {w , w, w3, W4, W5, We } 
Ww; = 12 dots on the face of the dice. 

X(w;) =) 


Distributions 


Probability assignments on intervals a < X < b 


Cumulative distribution 
The cumulative distribution function of a random variable X is a function 
Fy (R+ R) such that 


Equation: 
Fy (6) = Pr[x <5] 
= Pri{w € Q| X(w) < b}| 
X(0) 
———— > 
Q : ss 


Continuous Random Variable 
A random variable X is continuous if the cumulative distribution function can be 
written in an integral form, or 
Equation: 


and fx (x) is the probability density function (pdf) (e.g., Fx (x) is differentiable and 
fy (x) = 4, (Fx (2))) 


Discrete Random Variable 
A random variable X is discrete if it only takes at most countably many points (i.e., 
F x (-) is piecewise constant). The probability mass function (pmf) is defined as 
Equation: 


Pr|X = a; 
Fy (xx) — limit Fy (az) 


xu(a—>az) A (a@<apz) 


Px (xx) 


Two random variables defined on an experiment have joint distribution 


Equation: 


Fy, y (a,b) = Pr[xX <a,Y <) 
Pri{w € 2| (X(w) <a) A (Y(w) < bp} 


Y 
| (a,b) 


_——————— 


Joint pdf can be obtained if they are jointly continuous 


Equation: 
b a 
Fy y (a, b) = / ‘| fxy (x, y) dad y 
@Fy y(z, 
(e.g., fxy (2, y) = SEH) ) 


Joint pmf if they are jointly discrete 
Equation: 


PxY (2a, yi) = Pr[X =2,,Y = yi] 
Conditional density function 
Equation: 


fxy (x, y) 


fx (a) 


fyjx(ylz) = 


for all x with fx (x) > 0 otherwise conditional density is not defined for those values of x 
with fx (z) = 0 


Two random variables are independent if 
Equation: 


fx,y (x,y) =fx (x) fy (y) 
for all z € Randy € R. For discrete random variables, 
Equation: 


Pxy (te, yi) =Px (xe) Py (y) 


for all k and l. 


Moments 


Statistical quantities to represent some of the characteristics of a random variable. 
Equation: 


g(X) 


El9(X)] 
g(x) fx (x) da if continuous 
5, 9(@k) Px (xx) if discrete 


e Mean 
Equation: 
Wx =X 
e Second moment 
Equation: 
El X?|= xX? 
e Variance 


Equation: 


e Characteristic function 
Equation: 


®x(u) = em 


for u € R, where i = /—1 
Correlation between two random variables 
Equation: 


Ryy = RY 
| nen (ae ny fyy (a,y)dady if Xand Y are jointly continuous 


a CRY) pxy (tz, yi) if X and Y are jointly discrete 


e Covariance 
Equation: 


Cxy = Cov CX, y] 


= (X-ypx)(¥—py)" 


= Rxyy — uxby 
e Correlation coefficient 
Equation: 
Cov (X,Y) 
PxXY = 
OxXxOy 


Uncorrelated random variables 
Two random variables X and Y are uncorrelated if pxy = 0. 


Introduction to Stochastic Processes 


Definitions, distributions, and stationarity 


Stochastic Process 
Given a sample space, a stochastic process is an indexed collection of random variables defined for each 
we 92. 
Equation: 


Vt,t ER: (X;(w)) 


Example: 
Received signal at an antenna as in [link]. 
Sample Paths 


For a given t, X;(w) is a random variable with a distribution 
Equation: 
First-order distribution 


Fx,(b) = Pr[X; < } 
Pri{w € 2| Xz(w) < b}] 


First-order stationary process 
If F'x,(0) is not a function of time then X; is called a first-order stationary process. 


Equation: 
Second-order distribution 


Ee ae (b1, ba) = Pr[ Xz, < by, Xt, < bo] 


forallt; EC R,t €R,b, CR, bo ECR 
Equation: 


Nth-order distribution 


Pe este (ig Dds ed ., by) = Pr[ Xz, < bi, ene »» Xty < by] 


Nth-order stationary : A random process is stationary of order N if 
Equation: 


Fi Ky son Mee (OT) bo,...,bnw) = ba eae et eee CNA Uae bo,..., bn) 


Strictly stationary : A process is strictly stationary if it is Nth order stationary for all NV. 


Example: 
X, = cos(2afot + O(w)) where fo is the deterministic carrier frequency and O(w) : 2 — Ris arandom 
variable defined over |—7,, 7] and is assumed to be a uniform random variable; i.e., 


Sit 
fo(0) = { 21 [ | 
0 otherwise 
Equation: 
Fx,(b) = Pr[X; < 5] 
= Pricos(2rfot + O) < d 
Equation: 
F,(b) = Pr[—a < 2fot + O < —arccos(b)| + Priarccos(b) < 2afot + O < 7] 
Equation: 
ba —arccos(b))—27fot m—2rfot 
Fx,(0) = Apes Se ; = piel OS eea(a) —2rfot a dé 
= (2r-2 Oa x 
Equation: 
a= a(t — = arccos(z)) 
_ a Helcot 


0 otherwise 


This process is stationary of order 1. 


Plots of Cosines with different Phases and the same frequency 
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The second order stationarity can be determined by first considering conditional densities and the joint 
density. Recall that 
Equation: 


X; = cos(2rfot + O) 


Then the relevant step is to find 


Equation: 
Pree ab) | — 
Note that 
Equation: 
(Xi, = x1 = cos(2rfot + O)) > (O = arccos(x1) — 27 fot) 
Equation: 


Xt, = cos(2rfot2 + arccos(x1) — 2a fot1) 
cos(2m fo (t2 — t1) + arccos(x1)) 


Pr[X,,<b,| X,=x, ] 


cos(2Th(t, - ty) + cos X,) - 


Equation: 


by 
Fx,,,x,, (b2, b1) = fx,,(@1) Pr[Xt, < be | Xt, = 21] d x1 


Note that this is only a function of tz — fy. 


Example: 
Every T seconds, a fair coin is tossed. If heads, then X; = 1 fornT <t < (n+ 1)T. If tails, then 
X, = —lfornT <t < (n+1)T. 


X, f Sample function 


Equation: 


N|F wR 


Px (2) = 


for allt € R. X; is stationary of order 1. 
Second order probability mass function 
Equation: 


PX,,X.,(€1, ©2) = Px,,|x,, (€2]@1)Px,, (#1) 


The conditional pmf 
Equation: 
0 if L2 z= Ly 


PX,,|x,, 2101) a : if t2= 21 


when nT < t) < (n+ 1)T and nT < tz < (n+ 1)T for some n. 
Equation: 


Px,,|x,,(@2|@1) = Px., (#2) 


for all x; and for all 2 when nT < t; < (n+1)T andmT < tz < (m+1)T withn#m 
Equation: 
0 if r2 A axjfor nT < ti, te < (n+1)T 
eve ceo px, (t1) if a2 = afornT < ti, te < (n+1)T 
Px, (£1)px,,(t2) if n A mfor (nT < t) < (n+1)T) A (mT < ty < (m+1)T) 


Second-order Description 


Second-order description 


Practical and incomplete statistics 


Mean 
The mean function of a random process X; is defined as the expected value of 
X; for all t's. 


Equation: 
Le a. E [x t| 
_ ese zfy,(z) dz if continuous 
ope oo Lk PX, (Lx) if discrete 
Autocorrelation 
The autocorrelation function of the random process X; is defined as 
Equation: 


R= E|X,,X,,| 


CO CO —= . ° 
_ fo. e271 fx,,,X, (€2,%1) dz, d x2 if continuous 


De cae a ee PX,,,X4 (x1,x%) if discrete 


Fact 


If X; is second-order stationary, then R.x(t2,t1) only depends on tz — £4. 
Equation: 


Rx(t2,t1) = E|X:,Xi,| 
= tee Nisa @X1 fx, x, (z2,21)da,d 2 


Equation: 


Rx(to,ti) = foo fo 22% £x,,_.,,X (#2, 1) d a2 d xy 


Rx(t — t1,0) 


If R x(ta, ty) depends on ty — ft; only, then we will represent the autocorrelation 
with only one variable T = ty — ty 


Equation: 
Rx(rt) = Rx(to = ty) 
= Rx(te,t1) 
Properties 
1. Rx(0) >0 
2. Rx(r) = Rx(-7) 


Example: 

X;, = cos(2rfot + O(w)) and O is uniformly distributed between 0 and 27. The 
mean function 

Equation: 


ux(t) = EX] 

E|cos(27fot + O)| 

fee cos(2rfot + 0) =~ dé 
0 


The autocorrelation function 
Equation: 
Rx(t+7,t) = E Xi -X| 
= Elcos(2rfp (t+ 7) + O) cos(27fot + O)| 
= 1/2E|cos(27for)| + 1/2E|cos(27fp (2t + 7) + 20)| 
= 1/2cos(2rfor) + 1/2 jee cos(2mfo (2t +7) + 20)=- do 
= 1/2cos(27for) 


Not a function of ¢ since the second term in the right hand side of the equality in 
[link] is zero. 


Example: 

Toss a fair coin every T’ seconds. Since X; is a discrete valued random process, the 
statistical characteristics can be captured by the pmf and the mean function is 
written as 


Equation: 
wx(t) = EX] 
= 1/2x-1+1/2x1 
0 
Equation: 


Rx(to,t1) = Dig Ly TeeIPX,,,X,, (Lk, L1) 
= berxi2 1-12 
1 


when nT < t; < (n+ 1)T and nT < tz < (n+ 1)T 
Equation: 


Rx(t2,t1) = 1x1x1/4—1x-1x1/4—1x1x1/44+1x-1x1/4 
= 0 


when nT < t) < (n+1)T and mT < tz < (m+1)T withhn 4m 
Equation: 

Lif (nT <t, <(n+1)T) A (nT < te < (n4+1)T) 
0 otherwise 


Rx(te, ti) = { 
A function of t; and to. 
Wide Sense Stationary 


A process is said to be wide sense stationary if x is constant and Rx (to, t1) 
is only a function of tz — ¢}. 


Fact 


If X; is strictly stationary, then it is wide sense stationary. The converse is not 
necessarily true. 


Autocovariance 
Autocovariance of a random process is defined as 
Equation: 


Cx(tayt1) = E[(X1y — wx(ts))Xn — wx(h)| 


= Rx(ta,ti) — wx(te)ux(tr) 


The variance of X; is Var (Xz) = Cx(t,t) 


Two processes defined on one experiment ([link]). 


X, Y, 
Crosscorrelation 
The crosscorrelation function of a pair of random processes is defined as 
Equation: 
Rxy(ta,t1) = B|X:,Ys 
= I. ‘a ty fy, y, (t,y)dady 
Equation: 


Cxy (te, ti) = Rxy(te, t1) — wx(te) uy (t1) 


Jointly Wide Sense Stationary 
The random processes X; and Y; are said to be jointly wide sense stationary if 
Rxy (te, t1) is a function of t2 — t; only and x(t) and y(t) are constant. 


Gaussian Processes 


Gaussian Random Processes 


Gaussian process 
A process with mean jx (t) and covariance function C'x(t2, t1) is said 
to be a Gaussian process if any X = (X;,,Xt,...Xty) formed by 
any sampling of the process is a Gaussian random vector, that is, 


Equation: 
1 Be ee Pe as ee 
fx(a) = z e (4( bx) Xx ( px) 
(27) ? (det Xx) ? 
for all a € IR” where 
x(t) 
Mx = : 
px(ty) 
and 
Cx (ti, t1) eae Cx(t1, ty) 
a 
Cx(tn, t1) ree Cx(ty, tn) 


. The complete statistical properties of X; can be obtained from the 
second-order statistics. 


Properties 


1. If a Gaussian process is WSS, then it is strictly stationary. 

2. If two Gaussian processes are uncorrelated, then they are also 
Statistically independent. 

3. Any linear processing of a Gaussian process results in a Gaussian 
process. 


Example: 
X and Y are Gaussian and zero mean and independent. Z = X + Y is 
also Gaussian. 


Equation: 

_ -(#e4) 
foralluc R 
Equation: 


therefore Z is also Gaussian. 


White and Coloured Processes 


White Noise 


If we have a zero-mean Wide Sense Stationary process X, it is a White 
Noise Process if its ACF is a delta function at 7 = 0, i.e. it is of the form: 
Equation: 


rxx(T) = Px6(r) 


where Px is a constant. 


The PSD of X is then given by 
Equation: 
Sx(w) = f Px6(r)e~ (7) dt 
Pye‘) 
Px 


Hence _X is white, since it contains equal power at all frequencies, as in 
white light. 


Px is the PSD of X at all frequencies. 


But: 
Equation: 
Power of X = =~ f[™ Sx(w) dw 
=: 


so the White Noise Process is unrealizable in practice, because of its infinite 
bandwidth. 


However, it is very useful as a conceptual entity and as an approximation to 
‘nearly white’ processes which have finite bandwidth, but which are 'white' 
over all frequencies of practical interest. For 'nearly white' processes, 
rxx(r) is a narrow pulse of non-zero width, and S'x(w) is flat from zero up 
to some relatively high cutoff frequency and then decays to zero above that. 


Strict Whiteness and i.i.d. Processes 


Usually the above concept of whiteness is sufficient, but a much stronger 
definition is as follows: 


Pick a set of times {tj, t2,...,¢} to sample X(t). 


If, for any choice of {t1,t2,...,¢~} with N finite, the random variables 
X(t1), X(t2),... X(t) are jointly independent, i.e. their joint pdf is 
given by 

Equation: 


N 
fx(t)),X(t2),... X(tw) (1, Za,--- fn) = I] fx) (2) 
i=1 


and the marginal pdfs are identical, i.e. 
Equation: 


fxn) = Fx(tr) 


Fx(tw) 
fx 


then the process is termed Independent and Identically Distributed 
(i.i.d). 


If, in addition, fx is a pdf with zero mean, we have a Strictly White Noise 
Process. 


An i.i.d. process is 'white' because the variables X(t;) and X(t;) are jointly 
independent, even when separated by an infinitesimally small interval 
between ¢; and t;. 


Additive White Gaussian Noise (AWGN) 


In many systems the concept of Additive White Gaussian Noise (AWGN) 
is used. This simply means a process which has a Gaussian pdf, a white 
PSD, and is linearly added to whatever signal we are analysing. 


Note that although 'white' and Gaussian' often go together, this is not 
necessary (especially for 'nearly white’ processes). 


E.g. a very high speed random bit stream has an ACF which is 
approximately a delta function, and hence is a nearly white process, but its 
pdf is clearly not Gaussian - it is a pair of delta functions at +(V) and —V, 
the two voltage levels of the bit stream. 


Conversely a nearly white Gaussian process which has been passed through 
a lowpass filter (See next section) will still have a Gaussian pdf (as it is a 
summation of Gaussians) but will no longer be white. 


Coloured Processes 


A random process whose PSD is not white or nearly white, is often known 
as a coloured noise process. 


We may obtain coloured noise Y(t) with PSD Sy(w) simply by passing 
white (or nearly white) noise X(t) with PSD Px through a filter with 
frequency response #(w), such that from this equation from our 
discussion of Spectral Properties of Random Signals. 

Equation: 


Sy(w) = Sx(w)(|¥(w)|)° 
| 


Hence if we design the filter such that 
Equation: 


then Y(t) will have the required coloured PSD. 


For this to work, Sy (w) need only be constant (white) over the passband of 
the filter, so a nearly white process which satisfies this criterion is quite 
satisfactory and realizable. 


Using this equation from our discussion of Spectral Properties of Random 
Signals and [link], the ACF of the coloured noise is given by 
Equation: 


pyr) = Fee 


| 
vU 
a 
—~ 
——~ 
cLay 
* 
= 
| 
vy 
* 
= 
“~~ 
< 


where h(7) is the impulse response of the filter. 


This Figure from previous discussion shows two examples of coloured 
noise, although the upper waveform is more 'nearly white’ than the lower 
one, as can be seen in part c of this figure from previous discussion in 
which the upper PSD is flatter than the lower PSD. In these cases, the 
coloured waveforms were produced by passing uncorrelated random noise 
samples (white up to half the sampling frequency) through half-sine filters 
(as in this equation from our discussion of Random Signals) of length 

T, = 10 and 50 samples respectively. 


Linear Filtering 


Equation: 
Integration 
b 
ONS / X,(w) dt 
Equation: 
Linear Processing 
CO 
a / h(t,7r)X, dr 
Equation: 
Differentiation 
d 
X;'’ = —(X 
mer ce) 
Properties 


1.Z =f’ X(w) dt= f’ ux(t) dt 


2.22 =f? Xi, dtef’ Xi, dti=f’ f? Rx(ta,ti) dti dts 


Equation: 


prt) = fC hG,r)X,dr 
= fr h(t, T)ux(t) dr 


If X; is wide sense stationary and the linear system is time invariant 
Equation: 


wy(t) = fo, ee 


= Px ee h(t )d t! 
Equation: 
Ryx(te,t1) = Yi.Xze, 
= foe h(t T)X_ d 7TX¢, 
= yaa hit T)Rx(t = ty) dr 
Equation: 


Ry x(t, t1) = ae h(t —— t1 = T')Rx(r’) d ap! 
= h*Rx(te = t1) 


where 7’ = T — 14. 
Equation: 


Ry(te,ti1) = Y¥2Yn 


Yo his) eae 
| es h(ti, T)Ryx(te, T) dr 
= ie h(t, = T)Ryx(te = T) d T 


Equation: 

Ry (te, t1) = fees h(t’ Se (to — t1))Ryx(r') d et 
Ry (to — 11) 
h* Ry x (to, t1) 


where 7’ = to — rand h(r) = h(—7) forall r € R. Y; is WSS if X; is 
WSS and the linear system is time-invariant. 


Example: 

X; is a wide sense stationary process with wx = 0, and Rx(r) = 7 § (7) 
. Consider the random process going through a filter with impulse response 
h(t) = e- (u(t). The output process is denoted by Y;. y(t) = 0 for all 

cL. 

Equation: 


Ry(r) = sf h(a)h(a-7)da 


No e7 (47) 
or 


X; is called a white process. Y; is a Markov process. 


Power Spectral Density 


The power spectral density function of a wide sense stationary (WSS) 
process X; is defined to be the Fourier transform of the autocorrelation 
function of X;. 

Equation: 


Sx(f) = / Rx(r)e 777) dt 
if X; is WSS with autocorrelation function Rx(r). 


Properties 


1. Sx(f) = Sx(—f) since Rx is even and real. 
2. Var (Xz) = Rx(0) = f°. Sx(f) df 
3. Sx(f) is real and nonnegative Sx(f) > 0 for all f. 


If Y, = f° h(t —7r)X, d7 then 
Equation: 


Sy(f) = F(Ry(7)) 


ce H( f)= (ee h(t)e(@2rf#) dti=A(f) 


Example: 
X; is a white process and h(t) = e-‘“) u(t). 
Equation: 


Equation: 


ENG 


Sy(f) = dg fP 


Information Theory and Coding 


In the previous chapters, we considered the problem of digital transmission 
over different channels. Information sources are not often digital, and in 
fact, many sources are analog. Although many channels are also analog, it 
is still more efficient to convert analog sources into digital data and transmit 
over analog channels using digital transmission techniques. There are two 
reasons why digital transmission could be more efficient and more reliable 
than analog transmission: 


1. Analog sources could be compressed to digital form efficiently. 
2. Digital data can be transmitted over noisy channels reliably. 


There are several key questions that need to be addressed: 


1. How can one model information? 

2. How can one quantify information? 

3. If information can be measured, does its information quantity relate to 
how much it can be compressed? 

4. Is it possible to determine if a particular channel can handle 
transmission of a source with a particular information quantity? 


[sa] + (ci) 


Example: 

The information content of the following sentences: "Hello, hello, hello." 
and "There is an exam today." are not the same. Clearly the second one 
carries more information. The first one can be compressed to "Hello" 
without much loss of information. 


In other modules, we will quantify information and find efficient 
representation of information (Entropy). We will also quantify how much 
information can be transmitted through channels, reliably. Channel coding 
can be used to reduce information rate and increase reliability. 


Entropy 


Information sources take very different forms. Since the information is not known 
to the destination, it is then best modeled as a random process, discrete-time or 
continuous time. 


Here are a few examples: 


Digital data source (e.g., a text) can be modeled as a discrete-time and discrete 
valued random process X1, Xo, ..., where X; € {A, B,C, D, E,...} witha 
particular px, (x), px,(x), ..., anda specific px,x,, Dx,X,) ---, and px, x,Xx;, 
ieee Co Pree ee 

Video signals can be modeled as a continuous time random process. The 
power spectral density is bandlimited to around 5 MHz (the value depends on 
the standards used to raster the frames of image). 

Audio signals can be modeled as a continuous-time random process. It has 
been demonstrated that the power spectral density of speech signals is 
bandlimited between 300 Hz and 3400 Hz. For example, the speech signal can 
be modeled as a Gaussian process with the shown power spectral density over 
a small observation period. 


SH) 


300 3400 


These analog information signals are bandlimited. Therefore, if sampled faster than 
the Nyquist rate, they can be reconstructed from their sample values. 


Example: 
A speech signal with bandwidth of 3100 Hz can be sampled at the rate of 6.2 kHz. 
If the samples are quantized with a 8 level quantizer then the speech signal can be 


represented with a binary sequence with the rate of 
Equation: 


6.2 x 10? log, 8 — 18600 bits samples 


sample sec 


= oe 


sec 


Speech signal “A 0011011010111100 


ih 


1 7 seconds 
6.2x 10 


The sampled real values can be quantized to create a discrete-time discrete-valued 
random process. Since any bandlimited analog information signal can be 
converted to a sequence of discrete random variables, we will continue the 
discussion only for discrete random variables. 


Example: 

The random variable z takes the value of 0 with probability 0.9 and the value of 1 
with probability 0.1. The statement that x = 1 carries more information than the 
statement that a = 0. The reason is that x is expected to be 0, therefore, knowing 
that x = 1 is more surprising news!! An intuitive definition of information 
measure should be larger when the probability is small. 


Example: 

The information content in the statement about the temperature and pollution level 
on July 15th in Chicago should be the sum of the information that July 15th in 
Chicago was hot and highly polluted since pollution and temperature could be 
independent. 

Equation: 


I(hot, high) = I(hot) + I(high) 


An intuitive and meaningful measure of information should have the following 
properties: 


1. Self information should decrease with increasing probability. 
2. Self information of two independent events should be their sum. 
3. Self information should be a continuous function of the probability. 


The only function satisfying the above conditions is the -log of the probability. 


Entropy 
The entropy (average self information) of a discrete random variable X is a 
function of its probability mass function and is defined as 
Equation: 


N 
H(X) =— > px (a,)log px (zi) 


where NV is the number of possible values of X and px (x;) = Pr[X = z;]. 
If log is base 2 then the unit of entropy is bits. Entropy is a measure of 
uncertainty in a random variable and a measure of information it can reveal. 
A more basic explanation of entropy is provided in another module. 


Example: 

If a source produces binary information {0, 1} with probabilities p and 1 — p. The 
entropy of the source is 

Equation: 


H(X) = (— (plog, p)) — (1 — p) logy (1 — p) 


If p = 0 then A(X) =0, if p = 1 then A(X) =0, ifp =1/2 then A(X) =1 
bits. The source has its largest entropy if p = 1/2 and the source provides no new 
information if p = 0 or p = 1. 


Hp) 


nNi= 


Example: 
An analog source is modeled as a continuous-time random process with power 
spectral density bandlimited to the band between 0 and 4000 Hz. The signal is 
sampled at the Nyquist rate. The sequence of random variables, as a result of 
sampling, are assumed to be independent. The samples are quantized to 5 levels 
ie 2 ze 0, : Mae The probability of the samples taking the quantized values are 
+; +, ¥ + oT ae respectively. The entropy of the random variables are 
Equation: 


1 
H(X) = (— (3 1082 2)) — a loge ¢ — § loss = — 6 Joes ae — a6 108s 6 
= 5 log, 2+ F log, 4+ 4 log, 8+ = 76 logs 164+4 <q log, 16 
Sm pats 
15 _ bits 


8 sample 


There are 8000 samples per second. Therefore, the source produces 
8000 x @ = 150002 of information. 


Joint Entropy 
The joint entropy of two discrete random variables (X, Y) is defined by 
Equation: 


H(X,Y)=-S°S° pxy (ai, ys)log pxy (ai, 95) 
a 9 


The joint entropy for a random vector X = (X,X2...X ae is defined as 
Equation: 


H(X) _ -Soyo. ye px (21, 22,...,%n)log px (Liy Coys, a) 


L121 LyoLo 


Conditional Entropy 
The conditional entropy of the random variable X given the random variable 
Y is defined by 
Equation: 


H(X|Y)=-S °° pxy (#i,y,) log pxyy (wilys) 
Fj 


It is easy to show that 
Equation: 

and 

Equation: 


H(X,Y) = H(Y)+H(X|Y) 


If X1, Xo, ..., X, are mutually independent it is easy to show that 
Equation: 


H(X)= Ss H(X;) 


Entropy Rate 
The entropy rate of a stationary discrete-time random process is defined by 
Equation: 


H =limit H(X)|X1X2...Xn) 
n—-Cco 


The limit exists and is equal to 
Equation: 


ee 
H =limit —H(X, Xo,..., Xn) 
n->oco Nn 
The entropy rate is a measure of the uncertainty of information content per 
output symbol of the source. 


Entropy is closely tied to source coding. The extent to which a source can be 
compressed is related to its entropy. In 1948, Claude E. Shannon introduced a 
theorem which related the entropy to the number of bits per second required to 
represent a source without much loss. 


Source Coding 


As mentioned earlier, how much a source can be compressed should be 
related to its entropy. In 1948, Claude E. Shannon introduced three 
theorems and developed very rigorous mathematics for digital 
communications. In one of the three theorems, Shannon relates entropy to 
the minimum number of bits per second required to represent a source 
without much loss (or distortion). 


Consider a source that is modeled by a discrete-time and discrete-valued 
random process Xj, X92, ..., Xn, ... where 2; € {@1,@2,...,an} and 
define px,(x; = aj) = p; for 7 = 1,2,...,.N, where it is assumed that X1, 
X9,... Xy are mutually independent and identically distributed. 


Consider a sequence of length n 
Equation: 


The symbol a, can occur with probability p;. Therefore, in a sequence of 
length n, on the average, a; will appear np, times with high probabilities if 
n is very large. 


Therefore, 
Equation: 


Equation: 


where p; = P(X; = a;) for all j and for all 7. 


A typical sequence X may look like 
Equation: 


a2 


eal 
an 
a2 
X= 4% 


a1 


an 
a6 


where a; appears np; times with large probability. This is referred to as a 
typical sequence. The probability of X being a typical sequence is 
Equation: 


P(X=2)~TT pi = TT, (2'e”)™ 
Ik 2 npi logs pi 
— Qn is Pi log, D; 

5 —(nH(X)) 


where H(X) is the entropy of the random variables X,, X9,..., Xp. 


For large n, almost all the output sequences of length n of the source are 
equally probably with probability ~ 2~("7(*)), These are typical 
sequences. The probability of nontypical sequences are negligible. There 
are N” different sequences of length n with alphabet of size N. The 
probability of typical sequences is almost 1. 

Equation: 


# of typical seq. 


k=1 


set of typical 
sequences 


set of 
sequences of 


nontypical sequence length n 


Example: 

Consider a source with alphabet {A,B,C,D} with probabilities { +, 4, = 
+h. Assume X1, X9,..., Xg is an independent and identically distributed 
sequence with X; € {A, B,C, D} with the above probabilities. 
Equation: 


The number of typical sequences of length 8 
Equation: 


14 


The number of nontypical sequences 

A ee A ee: 

Examples of typical sequences include those with A appearing 8 x 5 = 4 
times, B appearing 8 x + = 2 times, etc. {A,D,B,B,A,A,C,A}, 
{A,A,A,A,C,D,B,B} and much more. 

Examples of nontypical sequences of length 8: {D,D,B,C,C,A,B,D}, 
{C,C,C,C,C,B,C,C} and much more. Indeed, these definitions and 
arguments are valid when n is very large. The probability of a source 
output to be in the set of typical sequences is 1 when n — oo. The 
probability of a source output to be in the set of nontypical sequences 
approaches 0 as n — oo. 


The essence of source coding or data compression is that as n — oo, 
nontypical sequences never appear as the output of the source. Therefore, 
one only needs to be able to represent typical sequences as binary codes and 
ignore nontypical sequences. Since there are only 2” -) typical sequences 
of length n, it takes nH(X) bits to represent them on the average. On the 
average it takes H(X) bits per source output to represent a simple source 
that produces independent and identically distributed outputs. 

Theorem 

Shannon's Source-Coding 


A source that produced independent and identically distributed random 
variables with entropy H can be encoded with arbitrarily small error 


probability at any rate R in bits per source output if R > H. Conversely, if 
R < H, the error probability will be bounded away from zero, independent 
of the complexity of coder and decoder. 


The source coding theorem proves existence of source coding techniques 
that achieve rates close to the entropy but does not provide any algorithms 
or ways to construct such codes. 


If the source is not i.i.d. (independent and identically distributed), but it is 
stationary with memory, then a similar theorem applies with the entropy 
H(X) replaced with the entropy rate H = limit H(Xn|X1X2...Xn-1) 


In the case of a source with memory, the more the source produces outputs 
the more one knows about the source and the more one can compress. 


Example: 

The English language has 26 letters, with space it becomes an alphabet of 
size 27. If modeled as a memoryless source (no dependency between 
letters in a word) then the entropy is H(X) = 4.03 bits/letter. 

If the dependency between letters in a text is captured in a model the 
entropy rate can be derived to be H = 1.3 bits/letter. Note that a non- 
information theoretic representation of a text may require 5 bits/letter since 
2° is the closest power of 2 to 27. Shannon's results indicate that there may 
be a compression algorithm with the rate of 1.3 bits/letter. 


Although Shannon's results are not constructive, there are a number of 
source coding algorithms for discrete time discrete valued sources that 
come close to Shannon's bound. One such algorithm is the Huffman source 
coding algorithm. Another is the Lempel and Ziv algorithm. 


Huffman codes and Lempel and Ziv apply to compression problems where 
the source produces discrete time and discrete valued outputs. For cases 
where the source is analog there are powerful compression algorithms that 
specify all the steps from sampling, quantizations, and binary 


representation. These are referred to as waveform coders. JPEG, MPEG, 
vocoders are a few examples for image, video, and voice, respectively. 


Huffman Coding 


One particular source coding algorithm is the Huffman encoding algorithm. 
It is a source coding algorithm which approaches, and sometimes achieves, 
Shannon's bound for source compression. A brief discussion of the 
algorithm is also given in another module. 


Huffman encoding algorithm 


1. Sort source outputs in decreasing order of their probabilities 

2. Merge the two least-probable outputs into a single output whose 
probability is the sum of the corresponding probabilities. 

3. If the number of remaining outputs is more than 2, then go to step 1. 

. Arbitrarily assign 0 and 1 as codewords for the two remaining outputs. 

5. If an output is the result of the merger of two outputs in a preceding 
step, append the current codeword with a 0 and a 1 to obtain the 
codeword the the preceding outputs and repeat step 5. If no output is 
preceded by another output in a preceding step, then stop. 


& 


Example: 
X ABC D with probabilities { —,—,—,—} 
Codeword 
A} | 0 
B 5 01 


= = = = —.As you may recall, the 
entropy of the source was also H X —. In this case, the Huffman 


code achieves the lower bound of — ———. 


In general, we can define average code length as 
Equation: 


where X is the set of possible values of x. 


It is not very hard to show that 
Equation: 


HX HX 


For compressing single source output at a time, Huffman codes provide 
nearly optimum code lengths. 


The drawbacks of Huffman coding 


1. Codes are variable length. 
2. The algorithm requires the knowledge of the probabilities, x» zx for 


alka.” aX 


Another powerful source coder that does not have the above shortcomings 
is Lempel and Ziv. 


Data Transmission and Reception 


We will develop the idea of data transmission by first considering simple 
channels. In additional modules, we will consider more practical channels; 
baseband channels with bandwidth constraints and passband channels. 
Simple additive white Gaussian channels 


Channel 


carries data, is a white 
Gaussian random process. 


The concept of using different types of modulation for transmission of data 
is introduced in the module Signalling. The problem of demodulation and 
detection of signals is discussed in Demodulation and Detection. 


Signalling 


Example: 


Data symbols are "1" or "0" and data rate is - Hertz. 
Pulse amplitude modulation (PAM) 


X 
A Modulated 
i 


Data ‘* 
A 
0-” 
-A 


Pulse position modulation 


X 
A Modulated 
x, i 
ata 


A — 


D 


Example: 
Example 
Data symbols are "1" or "0" and the data rate is 2 Hertz. 


00 ———> 


a T 
01 ———>- 4 
t 
x T 
t 
10 ————> T 
-A 
X, 
t 
T 


This strategy is an alternative to PAM with half the period, f. 


00 ———> 


x T 
01 ———>- 4 
t 
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Relevant measures are energy of modulated signals 
Equation: 


He ary nN (f° sn) a#) 


and how different they are in terms of inner products. 


Equation: 
T 
ee i Ae) ae 
0 
form-< 11.2.2... and ne 41,2... t. 
antipodal 


Signals s1(t) and s2(t) are antipodal if 
Vt, t € [0,7] : (s2(t) = —si(¢)) 


orthogonal 
Signals s;(t), S9(t),..., $,¢(t) are orthogonal if (s,,,, 8,,) = 0 for 
mn. 

biorthogonal 
Signals s1(t), s2(t),..., a(t) are biorthogonal if s1(t),..., Su (t) are 
orthogonal and s(t) = —s ™ ,,,(t) for some m € Holes ses oh. 


It is quite intuitive to expect that the smaller (the more negative) the inner 
products, ($m, $n) for all m # n, the better the signal set. 


Simplex signals 
Let {s1(t), s2(t),..., $a¢(t)} be a set of orthogonal signals with equal 
energy. The signals $7(t),..., $q7(¢) are simplex signals if 
Equation: 


If the energy of orthogonal signals is denoted by 
Equation: 


‘i 
Ym,m € {1,2,..., M}: (z. = / Sm°(t) d ) 
0 


then the energy of simplex signals 
Equation: 


and 
Equation: 


—l 
Ym #n: ((5 5 = 7a") 


It is conjectured that among all possible M-ary signals with equal energy, 
the simplex signal set results in the smallest probability of error when used 
to transmit information through an additive white Gaussian noise channel. 


The geometric representation of signals can provide a compact description 
of signals and can simplify performance analysis of communication systems 
using the signals. 


Once signals have been modulated, the receiver must detect and demodulate 
the signals despite interference and noise and decide which of the set of 
possible transmitted signals was sent. 


Geometric Representation of Modulation Signals 


Geometric representation of signals can provide a compact characterization 
of signals and can simplify analysis of their performance as modulation 
signals. 


Orthonormal bases are essential in geometry. Let {s1(t), 52(t),..., sac(t)} 
be a set of signals. 


Define ;(t) = ah = fy s(t) dt 
1 
Define 821 = (82, #1) =af.. 82 (t)1(t) dt and 


wot = = (so(t) = 8211) ae = i (so(t) = 8911 (t))” dt 


In general 
Equation: 


a (0 - Sow ) 


where Ey = i (sx (2) - ee s(t) dt. 


The process continues until all of the / signals are exhausted. The results 
are N orthogonal signals with unit energy, {~1(t), w2(t),..., bw(t)} 
where NV < M. If the signals {s1(t),..., $a¢(t)} are linearly independent, 
then VN = M. 


The WM signals can be represented as 
Equation: 


t) = s SmnYn(t) 


with m € {1,2,..., M} where smn = (8m, Wn) and Ey, = See, ae 


Sml1 
Sm2 
The signals can be represented by s,, = 
SmN 
Example: 
S(t) 
A 
t 
S,(t) fh 
t 
‘it 
-A 
Equation: 
S1 (t) 
v(t) = 
A2T 
Equation: 
$11 = AVT 
Equation: 


Equation: 


wo(t) = (s(t) — sa1v1(t) iis 


= (- A+ ae) 
=) 20) 
X Y, 


Dimension of the signal set is 1 with £,; = $117 and Ey = 891’. 


Example: 
S(t) s,(t) 


blo to Ln 


i= Sold where E, = ie OCs — ie 


JE, 0 0 0 
i f = VEs $3 = ua and s4 = 
OF Ie amelie VE, |’ 0 
0 0 0 VE; 


Equation: 


is the Euclidean distance between signals. 


Example: 
Set of 4 equal energy biorthogonal signals. s(t) = s(t), s2(t) = s(t), 
s3(t) = —s(t), sa(t) = mae 

The orthonormal basis #1(¢) = SE walt = galt) 


E, = [7 sm2(t) dt 7 mt 
oe (nen (Sg) 


four signals can be geometrically represented using the 4-vector of 
projection coefficients $1, $2, $3, and sq as a set of constellation points. 
Signal constellation 


p,(t) 


Equation: 


V2E, 

Equation: 
dig = do 
= dz, 
dy4 

Equation: 

dine — a |Si—s 3 

2/Es 

Equation: 
di3 = dog 


Minimum distance dinin = /2E; 


Demodulation and Detection 


Consider the problem where signal set, { 1, 2,...,  },for € [0, | is 
used to transmit log, bits. The modulated signal _—could be 
{ 1, 2,---,  }duringtheintervalO< < 


= + = ()+  for0< <_ for 
some €{1,2,...,_ }. 


Recall () = i ()for € {1,2,..., } the signals are 
decomposed into a set of orthonormal signals, perfectly. 


Noise process can also be decomposed 
Equation: 


- ()+ 


where = , ()d_ isthe projection onto the * basis signal, 
is the left over noise. 


The problem of demodulation and detection is to observe _ for 

0O< <_ anddecide which one of the _ signals were transmitted. 
Demodulation is covered here. A discussion about detection can be found 
here. 


Demodulation 


Demodulation 


Convert the continuous time received signal into a vector without loss of 
information (or performance). 


Equation: 
Tt = Sm(t) + WM 
Equation: 
N N a: 
rt = a SmnWn(t) si Se NnWn(t) ae Ni 
n=1 n=1 
Equation: 
N —_—_ 
Li S° (Sin ae Nn) Yn(t) ag Ni 
n=1 
Equation: 
N 
r= So rntn(t) +N; 
n=l 


The noise projection coefficients 7,,'s are zero mean, Gaussian random 
variables and are mutually independent if NV; is a white Gaussian process. 
Equation: 


[,(n) = Elm] 
= Bl fy Netn(t) at] 


Equation: 


n(n) = fo ELNildnl(t) dt 


= 0 
Equation: 
Elmt] = Ely Nevelt) dt fy Ned(@) a] 
= fo So NeNede(t)yn(t!) dt de’ 
Equation: 
T pT 
Bim) = [f° Ry(t—ew(tBna tad 
Equation: 


sme 
Binml = ff st-t)vuta@) atar 
Equation: 


Elm] = 2 fo ve(t)dn) dt 


No 
= 3° 9kn 


_ jPifkan 
OifkAn 


7, 'S are uncorrelated and since they are Gaussian they are also 
independent. Therefore, 7, ~ Gaussian (0, *) and R,(k,n) = Sen 


The r,,'s, the projection of the received signal r; onto the orthonormal bases 
W(t)'s, are independent from the residual noise process N;. 


The residual noise JV; is irrelevant to the decision process on 7;. 


Recall rn = 8mn +n; given $(t) was transmitted. Therefore, 
Equation: 


Ur(n) = Elsmn +n] 


Smn 


Equation: 


The correlation between 7,, and N; 
Equation: 


B| Nira | = ale = nl) ) 


Equation: 
_ N N 
E|Nir| = E [a —S> mebe(t)| 8mm + Elna] — S> Elta (t) 
k=1 k=1 
Equation: 
ee Di aia ao NING 
0 k=1 
Equation: 


B|Nira| = [ salt —t/)dn(t’) dt! — dal) 


Equation: 


B|Nr| = “Ydalt) — 3e4n(t) 
= 9 


Since both AN; and r,, are Gaussian then N; and r, are also independent. 
ry 
The conjecture is to ignore N; and extract information from 


TN 
Knowing the vector r we can reconstruct the relevant part of random 
process r; forO <t<T 
Equation: 


Te = Sm(t) + ™M 
= ry radalt) + 


S> 


Detector 


Once the received signal has been converted to a vector, the correct 
transmitted signal must be detected based upon observations of the input 
vector. Detection is covered elsewhere, 


Detection by Correlation 
Demodulation and Detection 


3> 


Detector 


Detection 


Decide which s,,(¢) from the set of {s1(t),..., $m(t)} signals was 
TP) 


r2 
transmitted based on observing r = _ _, the vector composed of 


TN 
demodulated received signal, that is, the vector of projection of the received 
signal onto the N bases. 
Equation: 


m =arg max Pr's,,(t) was transmitted | r was observed] 
1<m<M 


Note that 
Equation: 


ris Pr[ sn] 
fr 


Pr{s,,, | r] = Pr[s,,(t)was transmitted | r was observed] = 


If Pr[s,, was transmitted] = Ta that is information symbols are equally 
likely to be transmitted, then 
Equation: 


arg max Pris r| =arg max 
Cine | Semen Frlsin 


Since r(t) = s,,(t) + N; for 0 < t < T and for some m = {1, 2,..., MW} 


Uhl 
2 
then r = sS,, + 7 where 7 = ; and 77,'s are Gaussian and independent. 
TN 
Equation: 
aes 1 (tn-8m n) 
1 >No 
Vrnitn&R: fos, = Tage os 
No \ 2 
an“ ) 
Equation: 
m = arg max 
Ome da aes 


= arg, max | In(frjsm) 


= arg max (- (+ In(aNo))) — we Sue (Pn — Smn)° 


: N 2 
= arg, min | ya t= saa 


where D(r, Sm) is the lz distance between vectors r and s,,, defined as 
A YIN 2 
INT, 8) = Yo (tn — Sw) 


Equation: 


me = Ore D(r, 8m) 


arg, min (|| 7 I)” —2((r, 8m)) + (ll &m I) 


where || 7 || is the 12 norm of vector r defined as || 7 ||= Vos (rn) 


Equation: 


pa 2 
m =arg max 2 (7, 8m)) — (|| $m |l) 


This type of receiver system is known as a correlation (or correlator-type) 
receiver. Examples of the use of such a system are found here. Another type of 
receiver involves linear, time-invariant filters and is known as a matched filter 
receiver. An analysis of the performance of a correlator-type receiver using 
antipodal and orthogonal binary signals can be found in Performance Analysis. 


Examples of Correlation Detection 


The implementation and theory of correlator-type receivers can be found in 
Detection. 


Example: 


im = 2since D(r, 81) > D(r, 82) or (|| 81 ||)? = (|| s2 ||)? and 
(r, 82) > (1, $1). 


Example: 
Data symbols "0" or "1" with equal probability. Modulator s;(t) = s(t) for 
OS =F and: 45(6) = —s (tito 0 

S(t) 


A 


S(t) r 


-A 


w(t) = — $1. = AVT, and $9, = — (AvT) 


Equation: 


Vm,m = {1,2}: (r¢ = Sm(t) + M) 


Equation: 
a AVT + 


or 
Equation: 


MSS (AvT) i 


: N 
7m is Gaussian with zero mean and variance =~. 


T 


m =argmax {AvTri, — (AvTr1) Ip since A/T > 0 and 


Pr[s;] = Pr|s;] then the MAP decision rule decides. 
s(t) was transmitted if r; > 0 

S(t) was transmitted if r; < 0 

An alternate demodulator: 

Equation: 


(rt = Sm(t) + Ni) > (7 = 8m +7) 


Matched Filters 


Signal to Noise Ratio (SNR) at the output of the demodulator is a measure 
of the quality of the demodulator. 

Equation: 

signal energy 


SNR = ; 
noise energy 


In the correlator described earlier, Z, = (|8m|)” and i= Ao Is it 
possible to design a demodulator based on linear time-invariant filters with 
maximum signal-to-noise ratio? 


3> 


Detector 


If $:(t) is the transmitted signal, then the output of the k'® filter is given as 
Equation: 


ye(t) = fo rrhe(t—7) dr 
= f° (8m(t) + Nr)he(t-—7) d7 
fon, Sm(T)he(t — 7) d+ fo  N-ha(t — 7) d7 


Sampling the output at time 7’ yields 
Equation: 


yx(T) = [- Sm(T)hy(T — 7) d 7+ [- N,hi(T — 7) a7 


CO 


The noise contribution: 
Equation: 


Vk -| N,hy(T — 7) dr 


The expected value of the noise component is 
Equation: 


Ely] = El f?., N-h(T — 7) dz] 
= 0 


The variance of the noise component is the second moment since the mean 
is zero and is given as 
Equation: 


o(v%)? = Evy." | 
B| f%. N-hi(T — 7) d7 f°. Nphe(T — 7’) d | 


Equation: 


Evy," | — i eee) eee MO §(r —7')hg(T —T)he(T—7')drdr’ 
= %f® (InP —7)/)? dr 


Signal Energy can be written as 


Equation: 
([- air=—Aa r) 


and the signal-to-noise ratio (SNR) as 
Equation: 
([, 8m(r)he(T — 7) a7)” 
SNR = TANG POS Pigcfan aA. 
oF deo Ven a) |) dr 


The signal-to-noise ratio, can be maximized considering the well-known 
Cauchy-Schwarz Inequality 
Equation: 


(/ o(e)oua) a “) < f (ota)? a2 f (g(a)? ae 


CO —CoO 


with equality when gi(x) = ag2(x). Applying the inequality directly 
yields an upper bound on SNR 
Equation: 


(f, 8m(r)he(T — 7) dr)” 52 r 


No f° (ihy(T—r)|)2d7 No 


with equality Vr : (ap (T 7) = asm(7)). Therefore, the filter to 


examine signal m should be 
Equation: 
Matched Filter 


vr: (A(T) = 8m(T — 7)) 


The constant factor is not relevant when one considers the signal to noise 
ratio. The maximum SNR is unchanged when both the numerator and 
denominator are scaled. 


Equation: 
i = de 
ie | Mam(r)? ae = 


Examples involving matched filter receivers can be found here, An analysis 
in the frequency domain is contained in Matched Filters in the Frequency 
Domain. 


Another type of receiver system is the correlation receiver. A performance 
analysis of both matched filters and correlator-type receivers can be found 
in Performance Analysis. 


Examples with Matched Filters 


The theory and rationale behind matched filter receivers can be found in 
Matched Filters. 


Example: 
s(t) h7(t) 
T jt 
s,(t) h3(t) 


Silty) = tle e IU 

(j= —=tio0=t= 7 
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S(T) 


Equation: 


Vt,0<t<27: (a 
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00 
Equation: 
a(t) = for(T-t+7)dr 
= $r- r+ ehh 
= §(P-$) 
Equation: 
T3 
s1(T) = = 
Compared to the correlator-type demodulation 
Equation: 
81(t) 
pit) = JE, 
Equation: 
ie 
Siler i s1(7)pi(r) dt 
Equation: 


-- 5 (t) 
Matched Filter 
output 


Correlator output «Es 


Example: 

Assume binary data is transmitted at the rate of + Hertz. 
OF (Ge Si eh (on aera 
(be (soe ob ctorm Oia ge 
Equation: 


P 
X,= S— bjs(t - iT) 


4 —P 


5, (0) 
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Performance Analysis of Binary Orthogonal Signals with Correlation 


Orthogonal signals with equally likely bits, r; = $m(t) + N; forO <t < T,m=1,m = 2, and (si, 52) = 0. 
Correlation (correlator-type) receiver 


= (r = (rirz)" = 8m+ n) (see [link]) 


Decide s;(t) was transmitted if ry > ro. 
Equation: 


Equation: 


P. = 1/2Pr[r € Ro | s1(t) transmitted] + 1/2 Pr[r € R; | s2(t) transmitted] 
1/2 Sr, f fp s1(t) (r) dri drat 1/2 Sr, f Ersay(t) (r) dridrg 


=(r1-vEs|)? =(Ir2 


i )? 
No ——e ™ dridrz+1/2 fp 


1/2 fr, J pars Vaio 


—(lra/)? ~(|r2-vE5|)” 


1 N N; 
e =——e 0 drid 
, [> No VmNo : 
2 


Alternatively, if s1(¢) is transmitted we decide on the wrong signal if rz > ry or 72 > m1 + /E, or when 


n2—m > VES. 


Equation: 


Pp, = 1/2 ‘Wen et dn! +1/2Pri[ri > r2 | s2(t) transmitted] 
- o(/%) 


Note that the distance between sq and 82 is diz = /2E,. The average bit error probability P. = Q( aa ) as we 
0 


had for the antipodal case. Note also that the bit-error probability is the same as for the matched filter receiver. 


Performance Analysis of Orthogonal Binary Signals with Matched Filters 
Equation: 


If s(t) is transmitted 


Equation: 
Yi(T) = fi si(rhP(l—7)dr+u(T) 
= pe (r)s; (7)d7+1(T) 
= E,+(T) 
Equation: 


¥(T) = f°, 81(r)s9(r) dr +12(T) 


If s(t) is transmitted, ¥1(T) = 11(T) and Yo(T) = E, + 12(T). 


Equation: 


Equation: 


where V; and 12 are independent are Gaussian with zero mean and variance 
N ee 
-; E,. The analysis is identical to the correlator example. 


Equation: 


Note that the maximum likelihood detector decides based on comparing Y, 
and Y>. If Y; > Y> then s; was sent; otherwise s2 was transmitted. For a 
similar analysis for binary antipodal signals, refer here. See [link] or [link]. 


X, ? Sample function 


Carrier Phase Modulation 


Phase Shift Keying (PSK) 
Information is impressed on the phase of the carrier. As data changes from symbol period to symbol period, the 


phase shifts. 
Equation: 


Vm,m € {1,2,...,M}: (sul = APr(t) cos (2mfet = an) 


Example: 
Binary s1(t) or s2(t) 


Representing the Signals 


An orthonormal basis to represent the signals is 


Equation: 
vi(t) = : APr(t) cos(27 fet) 
Equation: 
-1 
abe (t) = ve sin(27 ft) 
The signal 
Equation: 
Sin(t) = APr(t) cos (2" fot + aun) 
M 
Equation: 


Sin(t) = Acos (=e-") Pr(t) cos(2mf.t) — A sin( 1) ) Pr(t) sin(2rf,t) 


The signal energy 
Equation: 


ee A*Pr*(t) cos? (2nfet + a) dt 


= fo 4 (GG +4 cos (4m ft + a) )) dt 


SS 
T 


Equation: 


A’T 
2 


T 4 | 2 
+ 5H f cos Aft + a) dt~ ae 
0 


M 


(Note that in the above equation, the integral in the last step before the aproximation is very small.) Therefore, 
Equation: 


vi(t) = | Pot) cos(27f,t) 


Equation: 
w(t) = (-/2) Pr(t) sin(27f.t) 
In general, 
Equation: 
2m (m — 1) 
Ym,m € {1,2,...,M}: (sul = APyr(t) cos (2mfet + ain— iy) 

and 71 (¢) 
Equation: 

yvi(t) = | 2 Prt) cos(27 ft) 
Equation: 

Y2(t) = i 2 Prt) sin(27f-t) 
Equation: 


Ra 


VE, cos( sce ) 


aa 2n(m—1 
VE, sin ( i 


Sm = 


nN" 


Demodulation and Detection 
Equation: 
rt = S(t) + Nz, for somem € {1,2,..., MZ} 
We must note that due to phase offset of the oscillator at the transmitter, phase jitter or phase changes occur 


because of propagation delay. 
Equation: 


2m (m — 1) 
M 


r, = APr(t) cos (ant ¢) + Ne 


For binary PSK, the modulation is antipodal, and the optimum receiver in AWGN has average bit-error 
probability 
Equation: 


The receiver where 
Equation: 


r_ = +(APr(t) cos(27fet + y)) + Mt 
The statistics 
Equation: 


ry = fo rt cos (27 fet + ~) dt 
= + (0 aAcos(2rf,t + y) cos (27 fict + ¢) d t) + fo. a cos (27 fict + ~)N; dt 


Equation: 


aA [tT . ‘ 
rpst = | cos (4m fict typ ~) cos(p ~) dt})4+m 
Equation: 


A a u A , AT . 
ryt = +( A reos(y- *)) +f (S cos(dnfut +9 +8) dt+m+ (“3 cos(p ®)) + 1 
0 


T i . : . : 2 
where 71 = @ {i N; cos (wet -- ~) dt is zero mean Gaussian with variance ~ ee ; 


Therefore, 
Equation: 


2 | 


_ of tex) 
= Q(cos(y — AA) 


which is not a function of a and depends strongly on phase accuracy. 
Equation: 


P.=Q cos(y — ~) 


The above result implies that the amplitude of the local oscillator in the correlator structure does not play a role 
in the performance of the correlation receiver. However, the accuracy of the phase does indeed play a major 
role. This point can be seen in the following example: 


Example: 
Equation: 


oo EVA cos (— (2x ft’) a5 21 f.T) 
Equation: 


a, = —1'Acos (Qr fet — (Qa fer’ —2nf.or + 6')) 


Local oscillator should match to phase 6. 


Carrier Frequency Modulation 


Frequency Shift Keying (FSK) 


The data is impressed upon the carrier frequency. Therefore, the M/ different signals are 
Equation: 


Sm(t) = APr(t) cos(2m fot + 2x (m —1)A(f)t + Om) 


form € {1,2,..., M} 


The M different signals have M different carrier frequencies with possibly different phase angles since the 
generators of these carrier signals may be different. The carriers are 


Equation: 
fi=fe 
fo = fet+ A(f) 
fu = f.- MA(f) 
Thus, the MM signals may be designed to be orthogonal to each other. 
Equation: 


Cony gee fo A? cos(2a fet + 2m (m —1)A(f)t + Om) cos(Qr fet + 2a (n—1)A(f)t + On) dt 


(m 
= #& fi cos(4mfet + 24 (n +m —2)A(f)t + Om +n) dt + 4 fi cos(2m(m —n)A(f)t 
7. sin(4af.T+2n(n+m—2)A(f)T+Om+On)—sin(Om+On) , A? (ee sin( ae n) 
a) 


Om 


ai 
2 4n f,+2n(n+m—2)A(f) 2n(m—n)A(f) 2x(m—n)A(f) +) 


If 2f-T + (n +m — 2)A(f)T is an integer, and if (m — n)A(f)T is also an integer, then (S,,, S,) = 0 if 
A(f)T is an integer, then ($m, 8n) ~ 0 when f, is much larger than +. 


In case Vm, Om = 0: (8m = 0) 
Equation: 


2 


sp since sinc (x) = 0 if ¢ = +(1) or +(2). 


Therefore, the frequency spacing could be as small as A(f) = 


If the signals are designed to be orthogonal then the average probability of error for binary FSK with optimum 
receiver is 
Equation: 


in AWGN. 


Note that sinc (x) takes its minimum value not at = -+(1) but at +(1.4) and the minimum value is —0.216. 
Therefore if A(f) = an then 


Equation: 


1.216E, 


P.=Q No 


which is a gain of 10 x log 1.216 ~ 0.85d0 over orthogonal FSK. 


Differential Phase Shift Keying 


The phase lock loop provides estimates of the phase of the incoming 
modulated signal. A phase ambiguity of exactly 7 is a common occurance 
in many phase lock loop (PLL) implementations. 


Therefore it is possible that, 6 = 9+ 7 without the knowledge of the 
receiver. Even if there is no noise, if b = 1 then b = 0 and if 6 = 0 then 
b=1. 


In the presence of noise, an incorrect decision due to noise may results in a 
correct final desicion (in binary case, when there is 7 phase ambiguity with 
the probability: 

Equation: 


Consider a stream of bits a, € {0,1} and BPSK modulated signal 
Equation: 


ys —1% APr(t — nT) cos(27f-t + 6) 


n 


In differential PSK, the transmitted bits are first encoded b, = an, ® bn_1 
with initial symbol (e.g. bg) chosen without loss of generality to be either 0 
or 1. 


Transmitted DPSK signals 
Equation: 


» —1°" APr(t — nT) cos(27 ft + 0) 


The decoder can be constructed as 
Equation: 


bn—1 © bn a Bn=1 Dan OB bn-1 
0a, 


— an 


If two consecutive bits are detected correctly, if b, = by, and b,_1 = bn_1 
then 
Equation: 


Oe = Dn SP) Bn 
= by @bn-1 
= An ® bn-1 D bn-1 
an 


if b,, = 6, @ 1 and ae = 6,1 ® 1. That is, two consecutive bits are 
detected incorrectly. Then, 
Equation: 


Gn = bn @®bn-1 
= b,@1@6b,1061 
= b6,06,-101061 
= bn @®bn_-1 80 
= bn @bn-1 
an 


If b,, = 6b, @ 1 and b,_; = b,_1, that is, one of two consecutive bits is 
detected in error. In this case there will be an error and the probability of 
that error for DPSK is 

Equation: 


Be, 2" (PP aan 
=. Py b, Se ee bn 1] + Pr b, os te ee bn 1] 


- 20(V%)-0(V%)] =9(V%) 


This approximation holds if Q is small. 


Digital Transmission over Baseband Channels 


Until this point, we have considered data transmissions over simple additive 
Gaussian channels that are not time or band limited. In this module we will 
consider channels that do have bandwidth constraints, and are limited to 
frequency range around zero (DC). The channel is best modified as g(t) is 
the impulse response of the baseband channel. 


Consider modulated signals x; = $,,(t) for0 < t < T for some 
m € {1,2,...,M}. The channel output is then 
Equation: 


| 


be trg(t—T)d7+M, 
= ™ Sn(t)g(t—7T)dr+N; 


—oo 


Tt 


The signal contribution in the frequency domain is 
Equation: 


Vf: Sm(f) = Sm(f)G(f) 


The optimum matched filter should match to the filtered signal: 
Equation: 


VF: H*(f) = Sn(f)G( fe OF 


This filter is indeed optimum (i.e., it maximizes signal-to-noise ratio); 
however, it requires knowledge of the channel impulse response. The signal 
energy is changed to 

Equation: 


The band limited nature of the channel and the stream of time limited 
modulated signal create aliasing which is referred to as intersymbol 
interference. We will investigate ISI for a general PAM signaling. 


Introduction to ISI 


A typical baseband digital system is described in Figure 1(a). At the 
transmitter, the modulated pulses are filtered to comply with some 
bandwidth constraint. These pulses are distorted by the reactances of the 
cable or by fading in the wireless systems. Figure 1(b) illustrates a 
convenient model, lumping all the filtering into one overall equivalent 
system transfer function. 


AVA 


Noise 


(b) 


Intersymbol interference in the detection process. (a) Typical 
baseband digital system. (b) Equivalent model 


Due to the effects of system filtering, the received pulses can overlap one 
another as shown in Figure 1(b). Such interference is termed InterSymbol 
Interfernce (ISI). Even in the absence of noise, the effects of filtering and 
channel-induced distortion lead to ISI. 


Nyquist investigated and showed that theoretical minimum system 
bandwidth needed in order to detect R, symbols/s, without ISI, is R,/2 or 
1/2T hertz. For baseband systems, when H(f) is such a filter with single- 
sided bandwidth 1/2T (the ideal Nyquist filter) as shown in figure 2a, its 
impulse response is of the form h(t) = sinc(t/T), shown in figure 2b. This 


sinc(t/T’)-shaped pulse is called the ideal Nyquist pulse. Even though two 
successive pulses h(t) and h(t — T) with long tail, the figure shows all tail 
of h(t) passing through zero amplitude at the instant when h(t — T)) is to 
be sampled. Therefore, assuming that the synchronization is perfect, there 
will be no ISI. 
Hf) 


(a) (b) 


Nyquist channels for zero ISI. (a) Rectangular system 
transfer function H(f). (b) Received pulse shape 
h(t) = sinc(t/T) 


Figure 2 Nyquist channels for zero ISI. (a) Rectangular system transfer 
function H(f). (b) Received pulse shape h(t) = sinc(t/T) 


The names "Nyquist filter" and "Nyquist pulse" are often used to describe 
the general class of filtering and pulse-shaping that satisfy zero ISI at the 
sampling points. Among the class of Nyquist filters, the most popular ones 
are the raised cosine and root-raised cosine. 


A fundamental parameter for communication system is bandwidth 
efficiency, R/W bits/s/Hz. For ideal Nyquist filtering, the theoretical 
maximum symbol-rate packing without ISI is 2symbols/s/Hz. For 
example, with 64-ary PAM, M = 64 = 2° amplitudes, the theoretical 
maximum bandwidth efficiency is possible without ISI is 
6bits/symbol.2symbols/s/Hz = 12bits/s/Hz. 


Pulse Amplitude Modulation Through Bandlimited Channel 
Consider a PAM system b_19,..., b-1, bo 61,... 


This implies 
Equation: 


Van; Gn, € {M levels of amplitude} : (: = se a,s(t — “)) 


The received signal is 
Equation: 
= eas yaaa Ans(t — (7 — nT))9(T) dz7+ NN: 
= Po An fo, a(t — (r — nT))g(7) dr +N; 
= eg Ond(t — nT) +N, 


n=— CO 


Since the signals span a one-dimensional space, one filter matched to 
§(t) = Sg(t) is sufficient. 


The matched filter's impulse response is 
Equation: 


Vt: (h°P*(t) = 39(T — t)) 


The matched filter output is 
Equation: 
yt) = fry Mind oo In d(t — (7 — nT) )R°P*(7) dr + v(t) 
= Vaw-oo On fo SE — (7 — nT) )hP(7) dr + v(t) 
= Yr. anu(t — nT) + v(t) 


nNn=—CO 


The decision on the &*" symbol is obtained by sampling the MF output at 
kT: 
Equation: 


y(kT) = S anu(kT — nT) + v(kT) 


n=—CO 


The k* symbol is of interest: 
Equation: 


y(kT) = azu(0) + S anu(kT — nT) + v(kT) 


n=—CO 


where n # k. 


Since the channel is bandlimited, it provides memory for the transmission 
system. The effect of old symbols (possibly even future signals) lingers and 
affects the performance of the receiver. The effect of ISI can be eliminated 
or controlled by proper design of modulation signals or precoding filters 
at the transmitter, or by equalizers or sequence detectors at the receiver. 


Precoding and Bandlimited Signals 


Precoding 


The data symbols are manipulated such that 
Equation: 


yx (kT) = azu(0) + ISI + v(kT) 


Design of Bandlimited Modulation Signals 


Recall that modulation signals are 


Equation: 
CO 
X= S- ans(t — nT’) 
n=—0O 
We can design s(t) such that 
Equation: 
large if n=0 


T)\ — 
ue) zero or small if n 40 


where y(kT’) = ayu(0) + 9. anu(kT — nT’) + v(kT) (IST is the 


sum term, and once again, n Zk .) Also, y(nT’) = sgh°?*(nT) The signal 
s(t) can be designed to have reduced ISI. 


Design Equalizers at the Receiver 


Linear equalizers or decision feedback equalizers reduce ISI in the statistic 
Yt 


Maximum Likelihood Sequence Detection 


Equation: 


Oo 


y(kKT) = S~ an (kT — nT) + v(k(T)) 


n=—OCoO 


By observing y(T),y(2T),.. . the date symbols are observed frequently. 
Therefore, ISI can be viewed as diversity to increase performance. 


Pulse Shaping to Reduce ISI 
The Raised-Cosine Filter 


Transfer function beloging to the Nyquist class (zero ISI at the sampling 
time) is called the raised-cosine filter. It can be express as 


1 | f |< 2Wo — W 
H(f)= cos(t ee 2) 2Wo — W <| f |< Wa) 
0 |\f>W 
: os|2n(W—W 
Ait) = 2W osine(2Wot) a (1b) 


Where W is the absolute bandwidth. Wy = 1/2T represent the minimum 
bandwidth for the rectangular spectrum and the -6 dB bandwith (or half- 
amplitude point) for the raised-cosine spectrum. W — Wp is termed the 
"excess bandwith" 


The roll-off factor is defined to be r = WT (2), whereO< r<l 


With the Nyquist constrain Wo = R, /2 equation (2) can be rewriten as 


W= S(1+r)R, 
A) 
1 | 2% 
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(a) 


Raised-cosine filter characteristics. (a) System transfer function. 


(b) System impulse response 


The raised-cosine characteristic is illustrate in figure 1 for 

r =0,r = 0.5,r = 1. When r = 1, the required excess bandwidth is 100 
%, and the system can provide a symbol rate of #, symbols/s using a 
bandwidth of R, herts (twice the Nyquist minimum bandwidth), thus 
yielding asymbol-rate packing 1 symbols/s/Hz. 


The lager the filter roll-off, the shorter will be the pulse tail. Small tails 
exhibit less sensitivity to timing errors and thus make for small degradation 
due to ISI. 


The smaller the filter roll-off the smaller will be the excess bandwidth. The 
cost is longer pulse tails, larger pulse amplitudes, and thus, greater 
sensitivity to timing errors. 


The Root Raised-Cosine Filter 


Recall that the raised-cosine frequency transfer function describes the 
composite H(f) including transmitting filter, channel filter and receiving 
filter. The filtering at the receiver is chosen so that the overall transfer 
function is a form of raised-cosine. Often this is accomplished by choosing 
both the receiving filter and the transmitting filter so that each has a transfer 
function known as a root raised cosine. Neglecting any channel-induced 
ISI, the product of these root-raised cosine functions yields the composite 
raised-cosine system transfer function. 


Two Types of Error-Performance Degradation 


Error-performance degradation can be classifyed in two group. The first one 
is due to a decrease in received signal power or an increase in noise or 
inteference power, giving rise to a loss in signal-to-noise ratio F.p/No. The 
second one is due to signal distortion such as ISI. 


Pp 


(1) Loss in E,/No 


-10 10 42 Eb/Ny dB 


Bit error probability 


Suppose that we need a communication system with a bit-error probability 
Pe versus FE, / No characteristic corresponding to the solid-line curve 
plotted in figure 1. Suppose that after the system is configured, the 
performance dose not follow the theoretical curve, but in facts follows the 
dashed line plot (1). A loss in E,/No due to some signal losses or an 
increased level of noise or interference. This loss in F'g/ Np is not so 
terrible when compared with possible effects of degradation caused by a 
distortion mechanism corresponding to the dashed line plot (2). Instead of 
suffering a simple loss in signal-to-noise ratio there is a degradation effect 
brought about by ISI. If there is no solution to this problem, there is no a 
mount of E’'g/No that will improve this problem. More F'g/ Np can not 
help the ISI problem because a incresing in F'p/ No dose not make change 
in overlapped pulses. 


Eye Pattern 


An eye pattern is the display that results from measuring a system’ s 


response to baseband signals in a prescribed way. 
Optimum sampling time 


Eye pattern 


Figure 1 describe the eye pattern that results for binary binary pulse 
signalling. The width of the opening indicates the time over which sampling 
for detection might be performed. The optimum sampling time corresponds 
to the maxmum eye opening, yielding the greatest protection against noise. 
If there were no filtering in the system then the system would look like a 
box rather than an eye. In figure 1, D4, the range of amplitude differences 
of the zero crossings, is a measure of distortion caused by ISI. 


Jr, the range of amplitude differences of the zero crossing , is a measure of 
the timmung jitter. M/y is a measure of noise margin. Sr is mesuare of 
sensity-to-timing error. 


In general, the most frequent use of the eye pattern is for qualitatively 
assessing the extent of the ISI. As the eye closes, ISI is increase; as the eye 


opens, ISI is decreaseing. 


Transversal Equalizer 


A training sequence used for equalization is often chosen to be a noise-like 
sequence which is needed to estimate the channel frequency response. 


In the simplest sense, training sequence might be a single narrow pulse, but 
a pseudonoise (PN) signal is preferred in practise because the PN signal has 
larger average power and hence larger SNR for the same peak transmitted 
power. 


0.9 


Received pulse exhibiting distortion 


Consider that a single pulse was transmitted over a system designated to 
have a raised-cosine transfer function Hac(t) = H;(f).H;(f), also 
consider that the channel induces ISI, so that the received demodulated 
pulse exhibits distortion, as shown in figure 1, such that the pulse sidelobes 
do not go through zero at sample times. To achieve the desired raised-cosine 
transfer function, the equalizing filter should have a frequency response 


ees ee ed 
Hef) = gy = Tere 

In other words, we would like the equalizing filter to generate a set of 
canceling echoes. The transversal filter, illustrated in figure 2, is the most 
popular form of an easily adjustable equalizing filter consisting of a delay 
line with T-second taps (where T is the symbol duration). The tab weights 


could be chosen to force the system impulse response to zero at all but one 
of the sampling times, thus making H,(f) correspond exactly to the inverse 
of the channel transfer function H,(f) 


Algorithm for 
Coefficient adjustment 


Transversal filter 


Consider that there are 2N + 1 taps with weights c_y,c_y41,.--cw . Output 
samples z(k) are the convolution the input sample x(k) and tap weights cy, 
as follows: 


z(k) = Soy a(k — n)enk = —2N,...2N(2) 


By defining the vectors z and c and the matrix x as respectively, 


Z= 20) St ey 
2(2N) CN 
«(—N) 0 0 0 0 
az(-N+1) a(-N) 0 
e= z(N) a(N—1) «(N —-2) a(-—N+1) «(-N) 
0 0 0 x(N)  «2(N—1) 
0 0 0 0 x(N) 


We can describe the relationship among z(k), x(k) and c,, more compactly 
as 


z= 2£.C(3a) 


Whenever the matrix x is square, we can find c by solving the following 
equation: 


c= « '2(3b) 


Notice that the index k was arbitrarily chosen to allow for 4N + 1 sample 
points. The vectors z and c have dimensions 4N + 1 and 2N + 1. Such 
equations are referred to as an overdetermined set. This problem can be 
solved in deterministic way known as the zero-forcing solution, or, in a 
Statistical way, known as the minimum mean-square error (MSE) solution. 


Zero-Forcing Solution 


At first, by disposing top N rows and bottom N rows, matrix x is 
transformed into a square matrix of dimension 2N + 1 by 2N + 1. Then 


equation c = z~!z is used to solve the 2N + 1 simultaneous equations for 


the set of 2N + 1 weights c,. This solution minimizes the peak ISI 
distortion by selecting the C,, weight so that the equalizer output is forced 
to zero at N sample points on either side of the desired pulse. 


(&) = 1 k-0 ; 
i ame Ye k= +£1,42,430 


For such an equalizer with finite length, the peak distortion is guaranteed to 
be minimized only if the eye pattern is initially open. However, for high- 
speed transmission and channels introducing much ISI, the eye is often 
closed before equalization. Since the zero-forcing equalizer neglects the 
effect of noise, it is not always the best system solution. 


Minimum MSE Solution 


A more robust equalizer is obtained if the c, tap weights are chose to 
minimize the mean-square error (MSE) of all the ISI term plus the noise 
power at the out put of the equalizer. MSE is defined as the expected value 
of the squared difference between the desire data symbol and the estimated 
data symbol. 


By multiplying both sides of equation (4) by 27, we have 
aziz = x? xc(5) 
And 


yg = Ayxc (6) 


Where R,,, = x” z is called the cross-correlation vector and R,x = x7 2 is 
call the autocorrelation matrix of the input noisy signal. In practice, R,, and 
AR, are unknown, but they can be approximated by transmitting a test 
signal and using time average estimated to solve for the tap weights from 


equation (6) as follows: 


= R-1 
C=>h ite 


Most high-speed telephone-line modems use an MSE weight criterion 
because it is superior to a zero-forcing criterion; it is more robust in the 
presence of noise and large ISI. 


Decision Feedback Equalizer 


The basic limitation of a linear equalizer, such as the transversal filter, is the 
poor perform on channel having spectral nulls. A decision feedback 
equalizer (DFE) is a nonlinear equalizer that uses previous detector decision 
to eliminate the ISI on pulses that are currently being demodulated. In other 
words, the distortion on a current pulse that was caused by previous pulses 


is subtracted. 
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Decision feedback Equalizer 


Figure 1 shows a simplified block diagram of a DFE where the forward 
filter and the feedback filter can each be a linear filter, such as transversal 
filter. The nonlinearity of the DFE stems from the nonlinear characteristic 
of the detector that provides an input to the feedback filter. The basic idea 
of a DFE is that if the values of the symbols previously detected are known, 
then ISI contributed by these symbols can be canceled out exactly at the 
output of the forward filter by subtracting past symbol values with 
appropriate weighting. The forward and feedback tap weights can be 
adjusted simultaneously to fulfill a criterion such as minimizing the MSE. 


The advantage of a DFE implementation is the feedback filter, which is 
additionally working to remove ISI, operates on noiseless quantized levels, 
and thus its output is free of channel noise. 


Adaptive Equalization 


Another type of equalization, capable of tracking a slowly time-varying 
channel response, is known as adaptive equalization. It can be implemented 
to perform tap-weight adjustments periodically or continually. Periodic 
adjustments are accomplished by periodically transmitting a preamble or 
short training sequence of digital data known by the receiver. Continual 
adjustment are accomplished by replacing the known training sequence 
with a sequence of data symbols estimated from the equalizer output and 
treated as known data. When performed continually and automatically in 
this way, the adaptive procedure is referred to as decision directed. 


If the probability of error exceeds one percent, the decision directed 
equalizer might not converge. A common solution to this problem is to 
initialize the equalizer with an alternate process, such as a preamble to 
provide good channel-error performance, and then switch to decision- 
directed mode. 


The simultaneous equations described in equation (3) of module 
“Transversal Equalizer”, do not include the effects of channel noise. To 
obtain stable solution to the filter weights, it is necessary that the data be 
averaged to obtain the stable signal statistic, or the noisy solution obtained 
from the noisy data must be averaged. The most robust algorithm that 
average noisy solution is the least-mean-square (LMS) algorithm. Each 
iteration of this algorithm uses a noisy estimate of the error gradient to 
adjust the weights in the direction to reduce the average mean-square error. 


The noisy gradient is simply the product e(k)r, of an error scalar e(k)and 
the data vector rz. 


e(k) = 2(k) — 2(k) (1) 


Where z(k) and 2(k) are the desired output signal (a sample free of ISI) 
and the estimate at time k. 


2(k) = clr, = Sane x(k — n)cy (2) 


Where c? is the transpose of the weight vector at time k. 


Iterative process that updates the set of weights is obtained as follows: 
c(k +1) = c(k) + Ae(k)rz (3) 


Where c(k) is the vector of filter weights at time k, and A is a small term 
that limits the coefficient step size and thus controls the rate of convergence 
of the algorithm as well as the variance of the steady state solution. Stability 
is assured if the parameter A is smaller than the reciprocal of the energy of 
the data in the filter. Thus, while we want the convergence parameter A\ to 
be large for fast convergence but not so large as to be unstable, we also 
want it to be small enough for low variance. 


Channel Capacity 


In the previous section, we discussed information sources and quantified 
information. We also discussed how to represent (and compress) 
information sources in binary symbols in an efficient manner. In this 
section, we consider channels and will find out how much information can 
be sent through the channel reliably. 


We will first consider simple channels where the input is a discrete random 
variable and the output is also a discrete random variable. These discrete 
channels could represent analog channels with modulation and 
Scr and detection. 


Discrete Channel 


Let us denote the input sequence to the channel as 
Equation: 


Xy 
X92 


where X; € X a discrete symbol set or input alphabet. 


The channel output 
Equation: 


Yn 
where Y; € Y a discrete symbol set or output alphabet. 


The statistical properties of a channel are determined if one finds 
py|x(y|x) forall y¢ Y and forall 2 ¢ X . A discrete channel is called 


a discrete memoryless channel if 
Equation: 


py|x(y|x) = [[pv (y;|@:) 
for all y € Y" andforalla € X. 


Example: 

A binary symmetric channel (BSC) is a discrete memoryless channel with 
binary input and binary output and py\x(y=0|x=1) = py|x(y=1|x=0). 
As an example, a white Gaussian channel with antipodal signaling and 


matched filter receiver has probability of error of Q = . Since the 


error is symmetric with respect to the transmitted bit, then 
Equation: 


py|x(0|1) = py)x(1/0) 


2E, 
Q [7B 
E 


S(O) 
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It is interesting to note that every time a BSC is used one bit is sent across 
the channel with probability of error of ¢. The question is how much 
information or how many bits can be sent per channel use, reliably. Before 
we consider the above question a few definitions are essential. These are 
discussed in mutual information. 


Mutual Information 


Recall that 
Equation: 


H(X,Y)=—S_Y° pxy (a, y)log pxy (2,9) 
zt yy 


Equation: 


H(Y) + H(X|Y) = H(X) + H(Y|X) 


Mutual Information 
The mutual information between two discrete random variables is 
denoted by .4(X; Y) and defined as 
Equation: 


I(X;Y) = H(X) — H(XIY) 


Mutual information is a useful concept to measure the amount of 
information shared between input and output of noisy channels. 


In our previous discussions it became clear that when the channel is noisy 
there may not be reliable communications. Therefore, the limiting factor 
could very well be reliability when one considers noisy channels. Claude E. 
Shannon in 1948 changed this paradigm and stated a theorem that presents 
the rate (speed of communication) as the limiting factor as opposed to 
reliability. 


Example: 
Consider a discrete memoryless channel with four possible inputs and 
outputs. 


a a 
b b 
c C 
d d 


Every time the channel is used, one of the four symbols will be 
transmitted. Therefore, 2 bits are sent per channel use. The system, 
however, is very unreliable. For example, if "a" is received, the receiver 
can not determine, reliably, if "a" was transmitted or "d". However, if the 
transmitter and receiver agree to only use symbols "a" and "c" and never 
use "b" and "d", then the transmission will always be reliable, but 1 bit is 
sent per channel use. Therefore, the rate of transmission was the limiting 
factor and not reliability. 


This is the essence of Shannon's noisy channel coding theorem, i.e., using 
only those inputs whose corresponding outputs are disjoint (e.g., far apart). 
The concept is appealing, but does not seem possible with binary channels 
since the input is either zero or one. It may work if one considers a vector of 
binary inputs referred to as the extension channel. 


X41 


X92 h 
Xinput vector= | €X = ({0,1}" 


Xn 


Yi 


Y output vector= | €Y = {0,1}” 


x" Y 


This module provides a description of the basic information necessary to 
understand Shannon's Noisy Channel Coding Theorem. However, for 
additional information on typical sequences, please refer to Typical 
Sequences. 


Typical Sequences 


If the binary symmetric channel has crossover probability ¢ then if x is transmitted then by the Law of 
Large Numbers the output y is different from a in ne places if n is very large. 
Equation: 


dy(x,y) ~ ne 


The number of sequences of length n that are different from a of length n at ne is 


Equation: 
oe ce 


Example: 
2 = (000)* and € = + and ne = 3 Xx $ The number of output sequences different from a by one 


element: aa = 22%) = 3 given by (101)", (011)", and (000)". 


Using Stirling's approximation 
Equation: 


nl ~ ne "V/2nn 


we Can approximate 
Equation: 


MY w gn((—(elogre))—(1-e) logs(1—e)) _ gn Fhe) 
NE 


where H,(e) = (— (e logy e)) — (1 — €) log, (1 — €) is the entropy of a binary memoryless source. For 
any there are 2”%(©) highly probable outputs that correspond to this input. 


Consider the output vector Y as a very long random vector with entropy nH(Y). As discussed earlier, the 
number of typical sequences (or highly probably) is roughly 2”), Therefore, 2” is the total number of 
binary sequences, 2””(¥) is the number of typical sequences, and 2”»(©) is the number of elements in a 
group of possible outputs for one input vector. The maximum number of input sequences that produce 
nonoverlapping output sequences 

Equation: 


_ gnH(Y) 
M —_ Qn, AG) 


gn(H(Y)—Hi(e)) 


typical sequence 
as the result 
of input 


nontypical 
X, sequence 


The number of distinguishable input sequences of length n is 
Equation: 


gr(H(Y)—Hile)) 


The number of information bits that can be sent across the channel reliably per n channel uses 
n (H(Y) — H,(e)) The maximum reliable transmission rate per channel use 
Equation: 


Re log, M 


n(H(Y)—H(e)) 


= H(Y)-—4H,(e) 


The maximum rate can be increased by increasing H(Y). Note that Hy(¢) is only a function of the 
crossover probability and can not be minimized any further. 


The entropy of the channel output is the entropy of a binary random variable. If the input is chosen to be 
uniformly distributed with px(0) = px(1) = $. 
Then 

Equation: 


py(0) = 1px(0) + epx(1) 


2 


and 
Equation: 


py(1) = Ipx(1) + epx(0) 


2) 


Then, H(Y) takes its maximum value of 1. Resulting ina maximum rate R = 1— H,(e) when 
px(0) = px(1) = 4. This result says that ordinarily one bit is transmitted across a BSC with reliability 


1 — e. If one needs to have probability of error to reach zero then one should reduce transmission of 
information to 1 — Hj(e) and add redundancy. 


Recall that for Binary Symmetric Channels (BSC) 
Equation: 


H(Y|X) = p,(0)H(Y|X = 0) +p.(1)H(Y|X = 1) 
px(0) (— (1 — €) log, (1 — €) — elogge)) + pe(1) (— ((1 — €) log, (1 — €) — € logy e)) 


(— ((1 — €) log (1 — €))) — € logge 
H,(e) 


Therefore, the maximum rate indeed was 
Equation: 


R = H(Y)—H(Y|X) 


I(X;Y) 


Example: 

The maximum reliable rate for a BSC is 1 — Hy(e). The rate is 1 when ¢ = 0 or e = 1. The rate is 0 
= il 

when € = = 


1-H) 


This module provides background information necessary for an understanding of Shannon's Noisy 
Channel Coding Theorem. It is also closely related to material presented in Mutual Information. 


Shannon's Noisy Channel Coding Theorem 


It is highly recommended that the information presented in Mutual 
Information and in Typical Sequences be reviewed before proceeding with 
this document. An introductory module on the theorem is available at Noisy 
Channel Theorems . 

Theorem 

Shannon's Noisy Channel Coding 


The capacity of a discrete-memoryless channel is given by 
Equation: 


C =MAaXp y (zx) eAe.< Y)| px(ax)} 


where .4(X; Y) is the mutual information between the channel input X 
and the output Y. If the transmission rate R is less than C’, then for any 

€ > O there exists a code with block length n large enough whose error 
probability is less than e. If R > C, the error probability of any code with 
any block length is bounded away from zero. 


Example: 

If we have a binary symmetric channel with cross over probability 0.1, 
then the capacity C’ ~ 0.5 bits per transmission. Therefore, it is possible to 
send 0.4 bits per channel through the channel reliably. This means that we 
can take 400 information bits and map them into a code of length 1000 
bits. Then the whole code can be transmitted over the channels. One 
hundred of those bits may be detected incorrectly but the 400 information 
bits may be decoded correctly. 


Before we consider continuous-time additive white Gaussian channels, let's 
concentrate on discrete-time Gaussian channels 
Equation: 


Y,=Xit+m 


where the X;'s are information bearing random variables and 7; is a 
Gaussian random variable with variance or. The input X,'s are constrained 


to have power less than P 
Equation: 


Consider an output block of size n 
Equation: 


Y=X-+7 


For large n, by the Law of Large Numbers, 
Equation: 


n 


1 . 2 1 2 2 
— : Se ae be <o 
- 3 als ) yea)" Se; 


This indicates that with large probability as n approaches infinity, Y will be 
located in an n-dimensional sphere of radius / NO? centered about X 
since (|y — #|)? < no,” 

On the other hand since X;'s are power constrained and 7; and X's are 
independent 

Equation: 


Equation: 


\Y¥|<n (P + Gi") 


This mean Y is in a sphere of radius ,/n (P + 0,7) centered around the 
origin. 


How many X's can we transmit to have nonoverlapping Y spheres in the 
output domain? The question is how many spheres of radius ,/no,,? fit ina 
sphere of radius «/n (P + 072). 

Equation: 


Exercise: 


Problem: 
How many bits of information can one send in n uses of the channel? 


Solution: 
Equation: 


P\?2 
logs (1 oF =) 
OG 


The capacity of a discrete-time Gaussian channel C' = -- log, (1 + =r) 
n 


bits per channel use. 


When the channel is a continuous-time, bandlimited, additive white 
; se: NG 
Gaussian with noise power spectral density —* and input power constraint 


P and bandwidth W. The system can be sampled at the Nyquist rate to 
provide power per sample P and noise power 
Equation: 


Ww 
oy = fw Pf 
WNo 


The channel capacity ~ log, (1 ois ate bits per transmission. Since the 
sampling rate is 2W, then 
Equation: 


2W P 
C= “5. log, (1 + naw) bits/trans. x trans. /sec 


Equation: 


P bit 
C = W log, (1 oe ) Ee 


NoW / sec 


Example: 
The capacity of the voice band of a telephone channel can be determined 
using the Gaussian model. The bandwidth is 3000 Hz and the signal to 


noise ratio is often 30 dB. Therefore, 
Equation: 
bits 


C = 3000 log, (1 + 1000) ~ 30000 — 
sec 


One should not expect to design modems faster than 30 Kbs using this 
model of telephone channels. It is also interesting to note that since the 
signal to noise ratio is large, we are expecting to transmit 10 
bits/second/Hertz across telephone channels. 


Channel Coding 


Channel coding is a viable method to reduce information rate through the 
channel and increase reliability. This goal is achieved by adding redundancy 
to the information symbol vector resulting in a longer coded vector of 
symbols that are distinguishable at the output of the channel. Another brief 
explanation of channel coding is offered in Channel Coding and the 
Repetition Code. We consider only two classes of codes, block codes and 
convolutional codes. 


Block codes 


The information sequence is divided into blocks of length k. Each block is 
mapped into channel inputs of length n. The mapping is independent from 
previous blocks, that is, there is no memory from one block to another. 


Example: 
he — 2 and — a 
Equation: 

00 — 00000 
Equation: 

01 — 10100 
Equation: 

10 > 01111 
Equation: 

11 — 11011 


information sequence > codeword (channel input) 


A binary block code is completely defined by 2" binary sequences of length 
n called codewords. 
Equation: 


C= {c1, CQ,-- -5 Cox } 
Equation: 


Ce {0, 1}” 


There are three key questions, 


1. How can one find "good" codewords? 

2. How can one systematically map information sequences into 
codewords? 

3. How can one systematically find the corresponding information 
sequences from a codeword, i.e., how can we decode? 


These can be done if we concentrate on linear codes and utilize finite field 
algebra. 


A block code is linear if c; € C' and c; € C implies c; 6 c; € C where ® 
is an elementwise modulo 2 addition. 


Hamming distance is a useful measure of codeword properties 
Equation: 


dy(c;,c;) = # of places that they are different 


oo oO 


Denote the codeword for information sequence e; = by g; and 


0 
0 0 
1 0 
0 0 
eg= 9 by ga,..., and eg = 0 by gx. Then any information 
0 0 
0 
sequence can be expressed as 
Equation: 
U1 
U a 
Uk 


= Dike: 


and the corresponding codeword could be 
Equation: 


k 
c=) uigi 


i=l 


Therefore 


Equation: 


c=uG 
91 
n k 92 
with c = {0,1}" andw € {0,1}" whereG= | ,akxn matrix and 
Gk 
all operations are modulo 2. 
Example: 
In [link] with 
Equation: 
00 — 00000 
Equation: 
01 — 10100 
Equation: 
10 > 01111 
Equation: 
11 > 11011 
Oe ah al 
= (01111)" = (10100)* ~ 
igi = (0 )” and gz = (10100) andG (; ea 8 ;) 


Additional information about coding efficiency and error are provided in 
Block Channel Coding. 


Examples of good linear codes include Hamming codes, BCH codes, Reed- 
Solomon codes, and many more. The rate of these codes is defined as & 
and these codes have different error correction and error detection 
properties. 


Convolutional Codes 


Convolutional codes are one type of code used for channel coding. Another 
type of code used is block coding. 


Convolutional codes 


In convolutional codes, each block of k bits is mapped into a block of 1n bits 
but these n bits are not only determined by the present / information bits 
but also by the previous information bits. This dependence can be captured 
by a finite state machine. 


Example: 
A rate + convolutional coder k = 1, n = 2 with memory length 2 and 
constraint length 3. 


set of typical 
sequences 


set of 
sequences of 


nontypical sequence length n 


Since the length of the shift register is 2, there are 4 different rates. The 
behavior of the convolutional coder can be captured by a 4 state machine. 
States, OO, O1, 10, 115 

For example, arrival of information bit © transitions from state 10 to state 
O01} 

The encoding and the decoding process can be realized in trellis structure. 


If the input sequence is 
1100 

the output sequence would be 
11 10 10 11 


The transmitted codeword is then 11 10 10 11. If there is one error on 
the channel 11 00 10 11 


00 


01 


1 


Starting from state 00 the Hamming distance between the possible paths 
and the received sequence is measured. At the end, the path with minimum 
distance to the received sequence is chosen as the correct trellis path. The 
information sequence will then be determined. 


Convolutional coding lends itself to very efficient trellis based encoding 
and decoding. They are very practical and powerful codes. 


Fading Channel 


For most channels, where signal propagate in the atmosphere and near the 
ground, the free-space propagation model is inadequate to describe the 
channel behavior and predict system performance. In wireless system, s 
signal can travel from transmitter to receiver over multiple reflective paths. 
This phenomenon, called multipath fading, can cause fluctuations in the 
received signal’s amplitude, phase, and angle of arrival, giving rise to the 
terminology multipath fading. Another name, scintillation, is used to 
describe the fading caused by physical changes in the propagating medium, 
such as variations in the electron density of the ionosopheric layers that 
reflect high frequency radio signals. Both fading and scintillation refer to a 
signal’s random fluctuations. 


Characterizing Mobile-Radio Propagation 


Characterizing Mobile-Radio Propagation 


' 
Fourier 
Transform 


‘ 
Fourier 
Transform 


Fading channel manifestations 


Figure 1 introduces an overview of fading channel. Large-scale fading 
represents the average power attenuation or the path loss due to motion over 
large areas. This phenomenon is affected by prominent terrain contours 
(e.g. hills, forests, billboards, clumps of buildings, etc) between the 
transmitter and receiver. Small-scale fading refers to the dramatic changes 
in signal amplitude and phase as a result of small changes (as small as half 
wavelength) in the spatial positioning between a receiver and transmitter. 
Small-scale fading is called Rayleigh fading if there are multiple reflective 
paths and no line-of-sight signal component otherwise it is called Rician. 
When a mobile radio roams over a large area it must process signals that 
experience both types of fading: small-scale fading superimposed on large- 


scale fading. Large-scale fading (attenuation or path loss) can be considered 
as a Spatial average over the small-scale fluctuations of the signal. 


There are three basic mechanisms that impact signal propagation in a 
mobile communication system: 


1. Reflection occurs when a propagating electromagnetic wave impinges 
upon smooth surface with very large dimensions relative to the RF 
signal wavelength. 

2. Diffraction occurs when the propagation path between the transmitter 
and receiver is obstructed by a dense body with dimensions that are 
large relative to the RF signal wavelength. Diffraction accounts for RF 
energy traveling from transmitter to receiver without line-of-sight path. 
It is often termed shadowing because the diffracted field can reach the 
receiver even when shadowed by an impenetrable obstruction. 

3. Scattering occurs when a radio wave impinges on either a large, rough 
surface or any surface whose dimension are on the other of the RF 
signal wavelength or less, causing the energy to be spread out or 
reflected in all directions. 


Link budget considerations for a fading channel 


Figure 2 is a convenient pictorial showing the various contributions that 
must be considered when estimating path loss for link budget analysis in a 
mobile radio application: (1) mean path loss as a function of distance, due 
to large-scale fading, (2) near-worst-case variations about the mean path 
loss or large-scale fading margin (typically 6-10 dB), (3) near-worst-case 
Rayleigh or small-scale fading margin (typically 20-30 dB) 


Using complex notation 
s(t) = Re{g(t).e7?"4- }(1) 


Where Re{.} denotes the real part of {.}, and f, is the carrier frequency. 
The baseband waveform g(t) is called the complex envelope of s(t) and 
can be expressed as 


g(t) =| g(t) | 4 = R(t).e# (2) 
Where R(t) =| g(t) | is the envelope magnitude, and y(t) is its phase. 


In fading environment, g(t) will be modified by a complex dimentionless 
multiplicative factor a(t).e 4, The modified baseband waveform can be 
written as a(t).e~/9).g(t). The magnitude of this envelope can be 
expressed as follow 


a(t).R(t) = m(t).ro(t).R(t)(3) 


Where m(t) and ro(t) are called the large-scale-fading component and the 
large-scale-fading component of the envelope respectively. 


Sometimes, m(t) is referred to as the local mean or log-normal fading, and 
ro(t) is referred to as multipath or Rayleigh fading. 


For the case of mobile radio, figure 3 illustrates the relationship between 
a(t).m(t). In figure 3a, the signal power received is a function of the 
multiplicative factor a(t). Small-scale fading superimposed on large-scale 
fading can be readily identified. The typical antenna displacement between 
adjacent signal-strength nulls due to small-scale fading is approximately 
half of wavelength. In figure 3b, the large-scale fading or local mean m(t) 
has been removed in order to view the small-scale fading ro(t). The log- 
normal fading is a relative slow varying function of position, while the 
Rayleigh fading is a relatively fast varying function of position. 
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Large-scale fading and small- 
scale fading 


Large-Scale Fading 


In general, propagation models for both indoor and outdoor radio channels 
indicate that mean path loss as follow 


Ly(d)- d/do "YD 
L,(d)dB = L,(do)dB + 10n.log(d/do) (2) 


Where d is the distance between transmitter and receiver, and the reference 
distance dg corresponds to a point located in the far field of the transmit 
antenna. Typically, dp is taken 1 km for large cells, 100 m for micro cells, 
and 1 m for indoor channels. Moreover dp is evaluated using equation 


2 
L,(d)) = “* @) 


or by conducting measurement. The value of the path-loss exponent n 
depends on the frequency, antenna height and propagation environment. In 
free space, n is equal to 2. In the presence of a very strong guided wave 
phenomenon (like urban streets), 7 can be lower than 2. When obstructions 
are present, 7 is larger. 


Measurements have shown that the path loss L, is a random variable 
having a log-normal distribution about the mean distant-dependent value 
L,(d) 


L,(d)(dB) = L.(do)(dB) + 10nlogio(d/do) + X_(dB)(4) 


Where X, denote a zero-mean, Gaussian random variable (in dB) with 
standard deviation (in dB). X, is site and distance dependent. 


As can be seen from the equation, the parameters needed to statistically 
describe path loss due to large-scale fading, for an arbitrary location with a 
specific transmitter-receiver separation are (1) the reference distance, (2) 
the path-loss exponent, and (3) the standard deviation X . 


Small-Scale Fading 
SMALL - SCALE FADING 


Small-scale fading refers to the dramatic changes in signal amplitude and 
phase that can be experienced as a result of small changes (as small as half 
wavelength) in the spatial position between transmitter and receiver. 


In this section, we will develop the small-scale fading component ro(t). 
Analysis proceeds on the assumption that the antenna remains within a 
limited trajectory so that the effect of large-scale fading m(t) is constant. 
Assume that the antenna is traveling and there are multiple scatter paths, 
each associated with a time-variant propagation delay 7,,(t) and a time 
variant multiplicative factor a,,(t). Neglecting noise, the received bandpass 
signal can be written as below: 


r(t) = din an(t)s(t — m(t)) 


Substituting Equation (1, module Characterizing Mobile-Radio 
Propagation) over into Equation (1), we can write the received bandpass 
signal as follow: 


r(t)=Re((Lo,, an(t)g(t — Ta(t) e242) 
= Rey an (te Hertel g(t — a(t) Je 
We have the equivalent received bandpass signal is 
s(t) = Yo, an(the YmMg(t — Tm (t))(3) 


Consider the transmission of an unmodulated carrier at frequency f, or in 
other words, for all time, g(t)=1. So the received bandpass signal become as 
follow: 


a(t) = Yin an(the Prem) = YT, cin (t)e Hn (4) 


The baseband signal s(t) consists of a sum of time-variant components 
having amplitudes a,,(t) and phases 6,,(t). Notice that 0,,(¢) will change by 


2m radians whenever 7,,(t) changes by 1/ f, (very small delay). These 
multipath components combine either constructively or destructively, 
resulting in amplitude variations of s(t). Final equation is very important 
because it tell us that a bandpass signal s(t) is the signal that experienced 
the fading effects and gave rise to the received signal r(t), these effects can 
be described by analyzing r(t) at the baseband level. 
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When the received signal is made up of multiple reflective arrays plus a 
significant line-of-sight (non-faded) component, the received envelope 
amplitude has a Rician pdf as below, and the fading is preferred to as Rician 
fading 


p(ro) — | o2 exp| 207 o( o2 ) La) — y) = (5) 


0 otherwise 


The parameter o? is the pre-detection mean power of the multipath signal. 
A denotes the peak magnitude of the non-faded signal component and 
Io(—) is the modified Bessel function. The Rician distribution is often 
described in terms of a parameter K, which is defined as the ratio of the 
power in the specular component to the power in the multipath signal. It is 
given by K = A? /20°. 


When the magnitude of the specular component A approach zero, the 
Rician pdf approachs a Rayleigh pdf, shown as 


2 


r 
“7 Xp =, | ro > 0 


P(To) = (6) 


0 otherwise 


The Rayleigh pdf results from having no specular signal component, it 
represents the pdf associated with the worst case of fading per mean 
received signal power. 


Small scale manifests itself in two mechanisms - time spreading of signal 
(or signal dispersion) and time-variant behavior of the channel (figure 2). It 
is important to distinguish between two different time references- delay 
time t and transmission time t. Delay time refers to the time spreading 
effect resulting from the fading channel’s non-optimum impulse response. 
The transmission time, however, is related to the motion of antenna or 
spatial changes, accounting for propagation path changes that are perceived 
as the channel’s time-variant behavior. 


Time-spreading mechanisms Time-variant mechanisms 
due to multipath due to motion 
FREQ-SELECTIVE FADING FAST FADING 
(ISI distortion, pulse (High Doppler, PLL failure, 
mutilation,irreducible BER) Multipath irreducible BER) 
: delay spread >Symbol time Channel fading rate >Symbol time 
Time-delay Doppler-shift 


domai , 
omain TELAT FADING (Loss in SNR) SLOW FADING (Low Doppler, loss | domain 


Multipath delay spread >Symbol in SNR) Channel fading 


time rate>Symbol time 


FREQ-SELECTIVE FADING FAST FADING 
(ISI distortion, pulse (High Doppler, PLL failure, 
mutilation,irreducible BER) Multipath irreducible BER) 
delay spread >Symbol time Channel fading rate >Symbol time Time 
Frequency ‘ 
domain domain 


SLOW FADING (Low Doppler, loss 
in SNR) Channel fading 
rate>Symbol time 


FLAT FADING (Loss in SNR) 
Multipath delay spread >Symbol time 


Signal Time-Spreading 
SIGNAL TIME — SPREADING 
Signal Time-Spreading Viewed in the Time-Delay Domain 


A simple way to model the fading phenomenon is proposed the notion 
wide-sense stationary uncorrelated scattering. The model treats arriving at a 
receive antenna with different delay as uncorrelated. 


In Figure 1(a), a multipath-intensity profile S(t) is plotted. S(t) helps us 
understand how the average received power vary as a function of time delay 
t. The term “time delay” is used to refer to the excess delay. It represents 
the signal’s propagation delay that exceeds the delay of the first signal 
arrival at the receiver. In wireless channel, the received signal usually 
consists of several discrete multipath components causing S(t). For a single 
transmitted impulse, the time 7’, between the first and last received 
component represents the maximum excess delay. 
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Degradation Categories due to Signal Time-Spreading Viewed in the 
Time-Delay Domain 


In a fading channel, the relationship between maximum excess delay time 
T;, and symbol time 7’, can be viewed in terms of two different degradation 
categories: frequency-selective fading and frequency nonselective or flat 
fading. 


A channel is said to exhibit frequency selective fading if T;,, > T’,. This 
condition occurs whenever the received multipath components of a symbol 
extend beyond the symbol’s time duration. In fact, another name for this 
category of fading degradation is channel-induced ISI. In this case of 
frequency-selective fading, mitigating the distortion is possible because 
many of the multipath components are resolved by receiver. 


A channel is said to exhibit frequency nonselective or flat fading if 

Tm < T. In this case, all of the received multipath components of a 
symbol arrive within the symbol time duration; hence, the components are 
not resolvable. There is no channel-induced ISI distortion because the 
signal time spreading does not result in significant overlap among 
neighboring received symbols. 


Signal Time-Spreading Viewed in the Frequency Domain 


A completely analogous characterization of signal dispersion can be 
specified in the frequency domain. In figure 1b, the spaced-frequency 
correlation function | R(Af) | can be seen, it is the Fourier transform of 
S(t). The correlation function | R(Af) | represents the correlation between 
the response of channel to two signals as a function of the frequency 
difference between two signals. The function | R(Af) | helps answer the 
correlation between received signals that are spaced in the frequency 

Af = fi — fo is what. | R(Af) | can be measured by transmitting a pair 
of sinusoids separated in frequency by Af, cross-correlating the complex 
spectra of two separated received signals, and repeating the process many 
times with ever-larger separation Af. Spectral components in that range are 
affected by the channel in a similar manner. Note that the coherence 
bandwidth fo and the maximum excess delay time 77, are related as 
approximation below 


fo ® 7) 


A more useful parameter is the delay spread, most often characterized in 
terms of its root-mean-square (rms) value, can be calculated as 


a, = (7? —77)1/7(2) 


Where f is the mean excess delay, (7)? is the mean squared, 7? is the 


second moment and a,, is the square root of the second central moment of 
S(T): 


A relationship between coherence bandwidth and delay spread does not 
exist. However, using Fourier transform techniques an approximation can 


be derived from actual signal dispersion measurements in various channel. 
Several approximate relationships have been developed. 


If the coherence bandwidth is defined as the frequency interval over which 
the channel’s complex frequency transfer function has a correlation of at 
least 0.9, the coherent bandwidth is approximately 


fo © ¥5, 8) 


With the dense-scatterer channel model, coherence bandwidth is defined as 
the frequency interval over which the channel’s complex frequency transfer 
function has a correlation of at least 0.5, to be 


fo ® = (4) 


Studies involving ionospheric effects often employ the following definition 
fo 3-5) 


The delay spread and coherence bandwidth are related to a channel’s 
multipath characteristic, differing for different propagation paths. It is 
important to note that all parameters in last equation independent of 
signaling speed, a system’s signaling speed only influences its transmission 
bandwidth W. 


Degradation Categories due to Signal Time-Spreading Viewed in the 
Frequency Domain 


A channel is preferred to as frequency-selective if fo < 1/T, ~ W (the 
symbol rate is taken to be equal to the signaling rate or signal bandwidth 
W). Frequency selective fading distortion occurs whenever a signal’s 
spectral components are not all affected equally by the channel. Some of the 
signal’s spectra components failing outside the coherent bandwidth will be 
affected differently, compared with those components contained within the 
coherent bandwidth (Figure 2(a)). 


Frequency- nonselective of flat-fading degradation occurs whenever 

fo > W. hence, all of signal’s spectral components will be affected by the 
channel in a similar manner (fading or non-fading) (Figure 2(b)). Flat 
fading does not introduce channel-induced ISI distortion, but performance 
degradation can still be expected due to the loss in SNR whenever the 
signal is fading. In order to avoid channel-induced ISI distortion, the 
channel is required to exhibit flat fading. This occurs, provide that 


fo>Wea 


T 
(6) 


Hence, the channel coherent bandwidth f0 set an upper limit on the 
transmission rate that can be used without incorporating an equalizer in the 
receiver. 


However, as a mobile radio changes its position, there will be times when 
the received signal experiences frequency-selective distortion even though 
fo > W (in Figure 2(c)). When this occurs, the baseband pulse can be 
especially mutilated by deprivation of its low-frequency components. Thus, 
even though a channel is categorized as flat-fading, it still manifests 
frequency-selective fading. 
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(a) Typical frequency-selective fading case (fo < W) 
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(b) Typical flat-fading case (fo > W) 


Transmitted signal 


Ww 
Spectral 
Density 


Channel 
frequency-transfer 
function 


| | Frequency 
fo 


(c) Null of channel frequency-transfer function 
occurs at signal band center (fy > W) 


Examples of Flat Fading and Frequency-Selective Fading 


The signal dispersion manifestation of the fading channel is analogous to 
the signal spreading that characterizes an electronic filter. Figure 3(a) 
depicts a wideband filter (narrow impulse response) and its effect on a 
signal in both time domain and the frequency domain. This filter resembles 
a flat-fading channel yielding an output that is relatively free of dispersion. 
Figure 3(b) shows a narrowband filter (wide impulse response). The output 
signal suffers much distortion, as shown both time domain and frequency 
domain. Here the process resembles a frequency-selective channel. 
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(b) Frequency selective fading channel characteristics 


Mitigating the Degradation Effects of Fading 


Figure 1 highlights three major performance categories in terms of bit-error 
probability Pgversus Ey, N 
] 


Frequency-selective fading or 
fast fading distortion 
(P, can approach to 0.5) 


10° Flat fading and 
slow fading 
Rayleigh limit 


0 2 10 15 20 25 30 35 


The leftmost exponentially shaped curve highlights the performance that 
can be expected when using any nominal modulation scheme in AWGN 
interference. Observe that at a reasonable &, N level, good performance 
can be expected. 


The middle curve, referred to as the Rayleigh limit, shows the performance 
degradation resulting from a loss in Ey N_ that is characteristic of flat 
fading or slow fading when there is no line-of-sight signal component 
present. The curve is a function of the reciprocal of Hy, N , so for practical 
values of Ey N , performance will generally be “bad.” 


The curve that reaches an irreducible error-rate level, sometimes called an 
error floor, represents “awful” performance, where the bit-error probability 
can level off at values nearly equal to 0.5. This shows the severe 
performance degrading effects that are possible with frequency-selective 
fading or fast fading. 


If the channel introduces signal distortion as a result of fading, the system 
performance can exhibit an irreducible error rate at a level higher than the 
desired error rate. In such cases, the only approach available for improving 
performance is to use some forms of mitigation to remove or reduce the 
signal distortion. 


The mitigation method depends on whether the distortion is caused by 
frequency-selective fading or fast fading. Once the signal distortion has 
been mitigated, the Pg versus £, N performance can transition from the 
“awful” category to the merely “bad” Rayleigh-limit curve. 


Next, it is possible to further ameliorate the effects of fading and strive to 
approach AWGN system performance by using some form of diversity to 
provide the receiver with a collection of uncorrelated replicas of the signal, 
and by using a powerful error-correction code. 


Figure 2 lists several mitigation techniques for combating the effects of 
both signal distortion and loss in SNR. The mitigation approaches to be 
used when designing a system should be considered in two basic steps: 


1) choose the type of mitigation to reduce or remove any distortion 
degradation; 


2) choose a diversity type that can best approach AWGN system 
performance. 


To combat distortion 


FREQ-SELECTIVE DISTORTION 


e Adaptive equalization (e.g., decision 
feedback, Viterbi equalizer) 

e Spread spectrum — DS or FH 

¢ Orthogonal FDM (OFDM) 

e Pilot signal 


FAST-FADING DISTORTION 


¢ Robust modulation 

e Signal redundancy to increase 
signaling rate 

e Coding and interleaving 


To combat loss in SNR 


FLAT-FADING AND SLOW-FADING 


e Some type of diversity to get addition 
uncorrelated estimates of signal 


e Error-correction coding 


DIVERSITY TYPES 


e Time (e.g., interleaving) 
e Frequency (e.g., BW expansion, spread 
spectrum FH or DS with Rake receiver) 


e Spatial (e.g., spaced receive antennas) 
e Polarization 


Mitigation to Combat Frequency-Selective Distortion 


Equalization can mitigate the effects of channel-induced ISI brought on by 
frequency-selective fading. It can help modify system performance described 
by the curve that is “awful” to the one that is merely “bad.” The process of 
equalizing for mitigating ISI effects involves using methods to gather the 
dispersed symbol energy back into its original time interval. 


An equalizer is an inverse filter of the channel. If the channel is frequency 
selective, the equalizer enhances the frequency components with small 
amplitudes and attenuates those with large amplitudes. The goal is for the 
combination of channel and equalizer filter to provide a flat composite- 
received frequency response and linear phase. 


Because the channel response varies with time, the equalizer filters must be 
adaptive equalizers. 


The decision feedback equalizer (DFE) involves: 


1) a feedforward section that is a linear transversal filter whose stage length 
and tap weights are selected to coherently combine virtually all of the current 
symbol’s energy. 


2) a feedback section that removes energy remaining from previously 
detected symbols. 


The basic idea behind the DFE is that once an information symbol has been 
detected, the ISI that it induces on future symbols can be estimated and 
subtracted before the detection of subsequent symbols. 


A maximum-likelihood sequence estimation (MLSE) equalizer: tests all 
possible data sequences and chooses the data sequence that is the most 
probable of all the candidates. The MLSE is optimal in the sense that it 
minimizes the probability of a sequence error. Since the MLSE equalizer is 
implemented by using Viterbi decoding algorithm, it is often referred to as 
the Viterbi equalizer. 


Direct-sequence spread-spectrum (DS/SS) techniques can be used to 
mitigate frequency-selective ISI distortion because the hallmark of spread- 


spectrum systems is their capability of rejecting interference, and ISI is a 
type of interference. 


Consider a DS/SS binary phase-shift keying (PSK) communication channel 
comprising one direct path and one reflected path. Assume that the 
propagation from transmitter to receiver results in a multipath wave that is 
delayed by 7 compared to the direct wave. The received signal, r(t), 
neglecting noise, can be expressed as follows: 


r(t) = Ax(t)g(t)cos(2nf.t) + aAx(t — 7)g9(t — T)cos(2nf,t + 6) 


where x(t) is the data signal, g(t) is the pseudonoise (PN) spreading code, 
and 7 is the differential time delay between the two paths. The angle @ is a 

random phase, assumed to be uniformly distributed in the range (0,27), and 
q is the attenuation of the multipath signal relative to the direct path signal. 


The receiver multiplies the incoming r(t) by the code g(t). If the receiver is 
synchronized to the direct path signal, multiplication by the code signal 
yields the following: 


r(t)g(t) = Ax(t)g?(t)cos(2nf.t) + aAx(t — 7)g(t)g(t — T)cos(2nf.t + 0) 
where g*(t) = 1. If 7 is greater than the chip duration, then 
| fg(t)g(t — r)dt |<| fg? (t)de | 


over some appropriate interval of integration (correlation). Thus, the spread 
spectrum system effectively eliminates the multipath interference by virtue 
of its code-correlation receiver. Even though channel-induced ISI is typically 
transparent to DS/SS systems, such systems suffer from the loss in energy 
contained in the multipath components rejected by the receiver. The need to 
gather this lost energy belonging to a received chip was the motivation for 
developing the Rake receiver. 


A channel that is classified as flat fading can occasionally exhibit frequency- 
selective distortion when the null of the channel’s frequency-transfer 
function occurs at the center of the signal band. The use of DS/SS is a 
practical way of mitigating such distortion because the wideband SS signal 


can span many lobes of the selectively faded channel frequency response. 
This requires the spread-spectrum bandwidth W,, (or the chip rate R.,), to 
be greater than the coherence bandwidth fo. The larger the ratio of W,, to fo 
, the more effective will be the mitigation. 


Frequency-hopping spread-spectrum (FH/SS): can be used to mitigate the 
distortion caused by frequency-selective fading, provided that the hopping 
rate is at least equal to the symbol rate. FH receivers avoid the degradation 
effects due to multipath by rapidly changing in the transmitter carrier- 
frequency band, thus avoiding the interference by changing the receiver band 
position before the arrival of the multipath signal. 


Orthogonal frequency-division multiplexing (OFDM): can be used for 
signal transmission in frequency-selective fading channels to avoid the use 
of an equalizer by lengthening the symbol duration. The approach is to 
partition (demultiplex) a high symbol-rate sequence into NV symbol groups, 
so that each group contains a sequence of a lower symbol rate (by the factor 
1/N) than the original sequence. The signal band is made up of N 
orthogonal carrier waves, and each one is modulated by a different symbol 
group. The goal is to reduce the symbol rate (signaling rate), W ~ 1/T,, on 
each carrier to be less than the channel’s coherence bandwidth fo. 


Pilot signal is the name given to a signal intended to facilitate the coherent 
detection of waveforms. Pilot signals can be implemented in the frequency 
domain as in-band tones, or in the time domain as digital sequences that can 
also provide information about the channel state and thus improve 
performance in fading conditions. 


Mitigation to Combat Fast-Fading Distortion 


e For fast-fading distortion, use a robust modulation (non-coherent or 
differentially coherent) that does not require phase tracking, and 
reduces the detector integration time. 

e Increase the symbol rate, W ~ 1/T;, to be greater than the fading 
rate, fq © 1/To, by adding signal redundancy. 

e Error-correction coding and interleaving can provide mitigation, 
because instead of providing more signal energy, a code reduces the 
required Ey /No. For a given Ey / No with coding present, the error 
floor will be lowered compared to the uncoded case. 


When fast-fading distortion and frequency-selective distortion occur 
simultaneously, the frequency-selective distortion can be mitigated by the 
use of an OFDM signal set. Fast fading, however, will typically degrade 
conventional OFDM because the Doppler spreading corrupts the 
orthogonality of the OFDM subcarriers. A polyphase filtering technique is 
used to provide time-domain shaping and partial-response coding to reduce 
the spectral sidelobes of the signal set, and thus help preserve its 
orthogonality. The process introduces known ISI and adjacent channel 
interference (ACI) which are then removed by a post-processing equalizer 
and canceling filter. 


Mitigation to Combat Loss in SNR 


Until this point, we have considered the mitigation to combat frequency- 
selective and fast-fading distortions. The next step is to use diversity 
methods to move the system operating point from the error-performance 
curve labeled as “bad” to a curve that approaches AWGN performance. The 
term diversity is used to denote the various methods available for providing 
the receiver with uncorrelated renditions of the signal of interest. Some of 
the ways in which diversity methods can be implemented are: 


¢ Time diversity: transmit the signal on L different time slots with time 
separation of at least Tg. When used along with error-correction coding, 
interleaving is a form of time diversity. 


¢ Frequency diversity: transmit the signal on L different carriers with 
frequency separation of at least fg. Bandwidth expansion is a form of 
frequency diversity. The signal bandwidth W is expanded so as to be 
greater than fo, thus providing the receiver with several independently- 
fading signal replicas. This achieves frequency diversity of the order 
L=W/fo. 


Whenever W is made larger than fo, there is the potential for frequency- 
selective distortion unless mitigation in the form of equalization is 
provided. 


Thus, an expanded bandwidth can improve system performance (via 
diversity) only if the frequency-selective distortion that the diversity may 
have introduced is mitigated. 


¢ Spread spectrum: In spread-spectrum systems, the delayed signals do not 
contribute to the fading, but to interchip interference. Spread spectrum is a 
bandwidth-expansion technique that excels at rejecting interfering signals. 
In the case of Direct-Sequence Spread-Spectrum (DS/SS), multipath 
components are rejected if they are time-delayed by more than the duration 
of one chip. However, in order to approach AWGN performance, it is 
necessary to compensate for the loss in energy contained in those rejected 
components. The Rake receiver makes it possible to coherently combine 


the energy from several of the multipath components arriving along 
different paths (with sufficient differential delay). 


¢ Frequency-hopping spread-spectrum (FH/SS) is sometimes used as a 
diversity mechanism. The GSM system uses slow FH (217 hops/s) to 
compensate for cases in which the mobile unit is moving very slowly (or 
not at all) and experiences deep fading due to a spectral null. 


¢ Spatial diversity is usually accomplished through the use of multiple 
receive antennas, separated by a distance of at least 10 wavelengths when 
located at a base station (and less when located at a mobile unit). Signal- 
processing techniques must be employed to choose the best antenna output 
or to coherently combine all the outputs. Systems have also been 
implemented with multiple transmitters, each at a different location. 


¢ Polarization diversity is yet another way to achieve additional 
uncorrelated samples of the signal. 


¢ Some techniques for improving the loss in SNR in a fading channel are 
more efficient and more powerful than repetition coding. 


Error-correction coding represents a unique mitigation technique, because 
instead of providing more signal energy it reduces the required E,/No 
needed to achieve a desired performance level. Error-correction coding 
coupled with interleaving is probably the most prevalent of the mitigation 
schemes used to provide improved system performance in a fading 
environment. 


Diversity Techniques 


This section shows the error-performance improvements that can be 
obtained with the use of diversity techniques. 


The bit-error-probability, Pg, averaged through all the “ups and downs” of 
the fading experience in a slow-fading channel is as follows: 


Pp = f Pp(x)p(x)dx 


where Pp(z) is the bit-error probability for a given modulation scheme at a 
specific value of SNR = x, where x = a? E;/ No, and p(2) is the pdf of x 
due to the fading conditions. With #, and No constant, @ is used to 
represent the amplitude variations due to fading. 


For Rayleigh fading, a has a Rayleigh distribution so that a”, and 
consequently x, have a chi-squared distribution: 


p(a) = pexp(—7) 2 >0 


where I" = a? E,/ No is the SNR averaged through the “ups and downs” of 
fading. If each diversity (signal) branch, 2 = 1,2,...,/M, has an 
instantaneous SNR = 4;, and we assume that each branch has the same 
average SNR given by I’, then 


p(y) = -exP(—F) 1 = 0 

The probability that a single branch has SNR less than some threshold 7 is: 
“y i 

Pn S 1) = fo Plvdy; = fo pexp(-F)dy 

= 1—exp(—}) 


The probability that all 7 independent signal diversity branches are 

received simultaneously with an SNR less than some threshold value 74 is: 
y¥\)M 

P(1,..1M <7) = [1—exp(—F)] 


The probability that any single branch achieves SNR > 7 is: 
M 
P(y, > y) =1- [1—-exp(-F)] 


This is the probability of exceeding a threshold when selection diversity is 
used. 


Example: Benefits of Diversity 

Assume that four-branch diversity is used, and that each branch receives an 
independently Rayleigh-fading signal. If the average SNR is I’ = 20dB, 
determine the probability that all four branches are received simultaneously 
with an SNR less than 10dB (and also, the probability that this threshold 
will be exceeded). 

Compare the results to the case when no diversity is used. 


Solution 


With y = 10dB, and y/I’ = 10dB — 20dB = —10dB = 0.1, we solve 
for the probability that the 


SNR will drop below 10dB, as follows: 

P(¥1,72,73)774 < 10dB) = [1 — exp(—0.1)]* = 8.2 x 10° 
or, using selection diversity, we can say that 

P(y; > 10dB) = 1 — 8.2 x 10° = 0.9999 

Without diversity, 

P(y1 < 10dB) = [1 — exp(—0.1)]* = 0.095 


P(y, > 10dB) = 1 — 0.095 = 0.905 


Diversity-Combining Techniques 


The most common techniques for combining diversity signals are selection, 
feedback, maximal ratio, and equal gain. 


Selection combining used in spatial diversity systems involves the sampling 
of M antenna signals, and sending the largest one to the demodulator. 
Selection-diversity combining is relatively easy to implement but not 
optimal because it does not make use of all the received signals 
simultaneously. 


With feedback or scanning diversity, the WV signals are scanned in a fixed 
sequence until one is found that exceeds a given threshold. This one 
becomes the chosen signal until it falls below the established threshold, and 
the scanning process starts again. The error performance of this technique is 
somewhat inferior to the other methods, but feedback is quite simple to 
implement. 


In maximal-ratio combining, the signals from all of the M branches are 
weighted according to their individual SNRs and then summed. The 
individual signals must be cophased before being summed. 


Maximal-ratio combining produces an average SNR yj equal to the sum of 
the individual average SNRs, as shown below: 


YM i ; P MO 


where we assume that each branch has the same average SNR given by 


Thus, maximal-ratio combining can produce an acceptable average SNR, 
even when none of the individual i y is acceptable. It uses each of the 
branches in a cophased and weighted manner such that the largest possible 
SNR is available at the receiver. 


Equal-gain combining is similar to maximal-ratio combining except that 
the weights are all set to unity. The possibility of achieving an acceptable 


output SNR from a number of unacceptable inputs is still retained. The 
performance is marginally inferior to maximal ratio combining. 


Modulation Types for Fading Channels 


An amplitude-based signaling scheme such as amplitude shift keying 
(ASK) or quadrature amplitude modulation (QAM) is inherently 
vulnerable to performance degradation in a fading environment. Thus, for 
fading channels, the preferred choice for a signaling scheme is a frequency 
or phase-based modulation type. 


In considering orthogonal FSK modulation for fading channels, the use of 
MFSK with M = 8 or larger is useful because its error performance is better 
than binary signaling. In slow Rayleigh fading channels, binary DPSK 
and 8-FSK perform within 0.1 dB of each other. 


In considering PSK modulation for fading channels, higher-order 
modulation alphabets perform poorly. MPSK with M = 8 or larger should 
be avoided. 


Example: Phase Variations in a Mobile Communication System 


The Doppler spread fg = V/A shows that the fading rate is a direct 
function of velocity. Table 1 shows the Doppler spread versus vehicle 
speed at carrier frequencies of 900 MHz and 1800 MHz. Calculate the 
phase variation per symbol for the case of signaling with QPSK modulation 
at the rate of 24.3 kilosymbols/s. 


Assume that the carrier frequency is 1800 MHz and that the velocity of the 
vehicle is 50 miles/hr (80 km/hr). Repeat for a vehicle speed of 100 
miles/hr. 


Table 1 


Velocity Doppler (Hz) Doppler (Hz) 


miles/hr km/hr 900 Mhz (A = 


33cm) 
3 5 4 
20 32 2/ 
50 60 66 
80 108 106 
120 192 160 


Solution 
At a velocity of 100 miles/hr: 
A@/symbol = ae x 360° 


132Hz re) 
= x 
24.3 103symbols/s 360 


= 2°/symbol 


1800 Mhz (A = 
16.6cm) 


8 


34 


At a velocity of 100 miles/hr: A0/symbol = 4°/symbol 


Thus, it should be clear why MPSK with a value of M > 4 is not generally 
used to transmit information in a multipath environment. 


The Role of an Interleaver 


The primary benefit of an interleaver for transmission in fading 
environment is to provide time diversity (when used along with error- 
correction coding). 


Figure 1 illustrates the benefits of providing an interleaver time span Tj, 
that is large compared to the channel coherence time 79, for the case of 
DBPSK modulation with soft-decision decoding of a rate 1/2, K = 7 


convolutional code, over a slow Rayleigh-fading channel. 
1 


DBPSK modem 

Slow Raleigh fading 
Convolutional encoding 
Rate 4%, K=7, (133, 171 )octai 
Soft Viterbi decoding 
32-bit paths 


Decoded bit error rate 


103 107 10°! 1 
Demodulated bit error rate 


It should be apparent that an interleaver having the largest ratio of Ty, /To 
is the best-performing (large demodulated BER leading to small decoded 
BER). This leads to the conclusion that Ty, /T should be some large 
number—say 1,000 or 10,000. However, in a real-time communication 
system this is not possible because the inherent time delay associated with 
an interleaver would be excessive. 


The previous section shows that for a cellular telephone system with a 
carrier frequency of 900 MHz, a Ty, /Tpo ratio of 10 is about as large as one 
can implement without suffering excessive delay. 


Note that the interleaver provides no benefit against multipath unless there 
is motion between the transmitter and receiver (or motion of objects within 
the signal-propagating paths). The system error-performance over a fading 
channel typically degrades with increased speed because of the increase in 
Doppler spread or fading rapidity. However, the action of an interleaver in 
the system provides mitigation, which becomes more effective at higher 
speeds 


Figure 2 show that communications degrade with increased speed of the 
mobile unit (the fading rate increases), the benefit of an interleaver is 
enhanced with increased speed. This is the results of field testing performed 
on a CDMA system meeting the Interim Specification 95 (IS-95) over a 
link comprising a moving vehicle and a base station. 
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Typical E,/No performance versus vehicle speed for 850 MHz links to 
achieve a frame-error rate of 1 percent over a Rayleigh channel with two 
independent paths 


The Viterbi Equalizer as Applied to GSM 


The GSM time-division multiple access (TDMA) frame in Figure 1 has 
duration of 4.615 ms and comprising 8 slots, one assigned to each active 
mobile user. A normal transmission burst occupying one time slot contains 
57 message bits on each side of a 26-bit midamble, called a training or 
sounding sequence. The slot-time duration is 0.577 ms (or the slot rate is 
1733 slots/s). The purpose of the midamble is to assist the receiver in 
estimating the impulse response of the channel adaptively (during the time 
duration of each 0.577 ms slot). For the technique to be effective, the fading 
characteristics of the channel must not change appreciably during the time 
interval of one slot. 
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Consider a GSM receiver used aboard a high-speed train, traveling at a 

constant velocity of 200 km/hr (55.56 m/s). Assume the carrier frequency to 

be 900 MHz (the wavelength is A = 0.33 m). The distance corresponding to 
/2 


a half-wavelength is traversed in Ty ~ - ~ 3 corresponds approximately 


to the coherence time. Therefore, the channel coherence time is more than 
five times greater than the slot time of 0.577 ms. The time needed for a 
significant change in channel fading characteristics is relatively long 
compared to the time duration of one slot. 


The GSM symbol rate (or bit rate, since the modulation is binary) is 271 
kilosymbols/s; the bandwidth, W, is 200 kHz. Since the typical rms delay 
spread a in an urban environment is on the order of 21s, then the resulting 
coherence bandwidth: 


fo » + & 100kHz 


50, 


Since fy < W, the GSM receiver must utilize some form of mitigation to 
combat frequency-selective distortion. To accomplish this goal, the Viterbi 
equalizer is typically implemented. 


Figure 2 shows the basic functional blocks used in a GSM receiver for 
estimating the channel impulse response. 
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This estimate is used to provide the detector with channel-corrected 
reference waveforms as explained below: (the Viterbi algorithm is used in 
the final step to compute the MLSE of the message bits) 


Let s;,(t¢) be the transmitted midamble training sequence, and r;,(t) be the 
corresponding received midamble training sequence. We have: 


riz(t) = Str(t) * halt) 


At the receiver, since r;;(t) is part of the received normal burst, it is 
extracted and sent to a filter having impulse response hyy¢(t) , that is 


matched to s;;(¢). This matched filter yields at its output an estimate of 
h(t), denoted h(t): 


helt) = relt) * hae b) 
= Sir(t) *#Ae(t) ee) 


where R,(t) = str(t) * Ame(t) is the autocorrelation function of s;,(¢). If 
Str(t) is designed to have a highly-peaked (impulse-like) autocorrelation 
function R,(t), then h.(t) + h(t). 


Next, we use a windowing function, w(t), to truncate h(t) to form a 
computationally affordable function, h(t). The time duration of w(t), 
denoted Lo, must be large enough to compensate for the effect of typical 
channel-induced ISI. The term Lp consists of the sum of two contributions, 
namely Lys, corresponding to the controlled ISI caused by Gaussian 
filtering of the baseband waveform (which then modulates the carrier using 
MSK), and Lc, corresponding to the channel-induced ISI caused by 
multipath propagation. Thus, 


Lo = Lasi + Lo 


The GSM system is required to provide distortion mitigation caused by 
signal dispersion having delay spreads of approximately 15-20 ps. Since in 
GSM the bit duration is 3.69 ps, we can express Lg in units of bit intervals. 
Thus, the Viterbi equalizer used in GSM has a memory of 4—6 bit 
intervals. For each Lo-bit interval in the message, the function of the Viterbi 
equalizer is to find the most likely Lo-bit sequence out of the 2”° possible 
sequences that might have been transmitted. 


Determining the most likely transmitted Lo-bit sequence requires that 2”° 
meaningful reference waveforms be created by disturbing) the 2”° ideal 
waveforms (generated at the receiver) in the same way that the channel has 
disturbed the transmitted slot. Therefore, the 2”° reference waveforms are 
convolved with the windowed estimate of the channel impulse response, 
h(t) in order to generate the disturbed or so-called channel-corrected 
reference waveforms. 


Next, the channel-corrected reference waveforms are compared against the 
received data waveforms to yield metric calculations. However, before the 
comparison takes place, the received data waveforms are convolved with 
the known windowed autocorrelation function w(t)R,(t), transforming 
them in a manner comparable to the transformation applied to the reference 
waveforms. This filtered message signal is compared to all possible 2”° 
channel-corrected reference signals, and metrics are computed in a manner 
similar to that used in the Viterbi decoding algorithm. It yields the 
maximum likelihood estimate of the transmitted data sequence. 


The Rake Receiver Applied to Direct-Sequence Spread-Spectrum (DS/SS) 
Systems 


Interim Specification 95 (IS-95) describes a Direct-Sequence Spread- 
Spectrum (DS/SS) cellular system that uses a Rake receiver to provide 
path diversity for mitigating the effects of frequency-selective fading. The 
Rake receiver searches through the different multipath delays for code 
correlation and thus recovers delayed signals that are then optimally 
combined with the output of other independent correlators. 


Figure 1 show the power profiles associated with the five chip 
transmissions of the code sequence 1 0 1 1 1. Each abscissa shows three 
components arriving with delays 71, T2, and 73. Assume that the intervals 
between the transmission times ¢; and the intervals between the delay times 
7; are each one chip in duration. The component arriving at the receiver at 
time t_4, with delay 73, is time-coincident with two others, namely the 
components arriving at times t_3 and ¢_2 with delays 72 and 7; 
respectively. Since in this example the delayed components are separated by 
at least one chip time, they can be resolved. 
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At the receiver, there must be a sounding device dedicated to estimating the 
T; delay times. Note that the fading rate in mobile radio system is relatively 


slow (in the order of milliseconds) or the channel coherence time large 
compared to the chip time duration ( 7 > Ton). Hence, the changes in 7; 
occur slowly enough that the receiver can readily adapt to them. 


Once the 7; delays are estimated, a separate correlator is dedicated to 
recovering each resolvable multipath component. In this example, there 
would be three such dedicated correlators, each one processing a delayed 
version of the same chip sequence 1 0 1 1 1. Each correlator receives chips 
with power profiles represented by the sequence of components shown 
along a diagonal line. For simplicity, the chips are all shown as positive 
signaling elements. In reality, these chips form a pseudonoise (PN) 
sequence, which of course contains both positive and negative pulses. Each 
correlator attempts to correlate these arriving chips with the same 
appropriately synchronized PN code. At the end of a symbol interval 
(typically there may be hundreds or thousands of chips per symbol), the 
outputs of the correlators are coherently combined, and a symbol detection 
is made. 


The interference-suppression capability of DS/SS systems stems from the 
fact that a code sequence arriving at the receiver time-shifted by merely one 
chip will have very low correlation to the particular PN code with which the 
sequence is correlated. Therefore, any code chips that are delayed by one or 
more chip times will be suppressed by the correlator. The delayed chips 
only contribute to raising the interference level (correlation sidelobes). 


The mitigation provided by the Rake receiver can be termed path diversity, 
since it allows the energy of a chip that arrives via multiple paths to be 
combined coherently. Without the Rake receiver, this energy would be 
transparent and therefore lost to the DS/SS receiver. 


